Data Structure Basics pt.2

Table of Contents

Review

In our previous post on Data Structure Basics, we defined a data structure. We also introduce the struct. In this post we get to look into arrays.

Arrays

At its core, an array is formed by multiple independent data objects enclosed inside one named container. So, not because the array has a name it means it’s data objects get one. Instead, each data object has a sequential index. In other words, an array is an ordered collection of items.

Simple Arrays

Simple arrays usually have the following properties:

  • Zero-based index.
  • Fixed size (immutable).
  • Specific data type.

One-dimensional Arrays

This kind of array only holds atomic distinct values. Each value has an index attach to it. We can think of an index as the address where we can consistently find a value.

Two-dimensional Arrays

Arrays with two dimensions are usually called a matrix or table. Even though they are usually explain in terms of rows and columns of information we are better of thinking of them as an array of arrays. That is, all data objects within a named container are arrays.

In order to access an item in a two dimensional array we need two indexes. The first one to specify which of the many arrays inside our named container holds the data we are after. The second to find the actual data object.

Multidimensional Arrays

Multidimensional arrays are a generalization of the two dimensional ones. In multidimensional arrays we need as many indexes as dimensions there are to access the data contained therein. Each of those indexes show where in the respective nested array we can find either the following nested array or the data object we are after. In other words, each index represent the spot in its own dimension where it can access the next dimension, and so on, until we reach the desired object.

Jagged Arrays

There are times when a multidimensional array should nest arrays of different sizes. For instance, if we create a two dimensional array to store the total of tickets sold per day in a year. By the traditional definition of an array, since we would have to use one-size arrays to store the sales. This would create impossibilities such as February the 31th, which can only backfire on us.

For those kind of situations is better to use a Jagged Array; a multidimensional array which contains different size dimensions. In pseudo code:

ticket_sales = Array.new(12) do
  for each month in ticket_sales
    if april, june, september, november
      Array.new(30)
    elsif february && leap_year
      Array.new(29)
    elsif february && not_leap_year
      Array.new(28)
    else
      Array.new(31)
    end

    add array to ticket_sales[month]
    next month
  end
end

Array Actions

Arrays should allow us to:

  • Append objects at the end.
  • Insert objects at specific index.
  • Remove objects.
  • Sort own objects.

Sorting an Array

Many languages come with built-in sorting algorithm. In general, they are well tested and implemented. We should familiarize with its internals and keep in mind the cases when it might not be what we need.

Sorting Custom Objects

More often than not, the sorting algorithms that come with a language won’t work on arrays of custom objects. If we want our array to be able to sort itself we need to make sure to implement the necessary comparison functions in our custom object. That way when the sorting algorithm commands an object to compare itself to another it will know how do so. The pseudo code for such a function is:

# For basic comparisons
PseudoCompare(a, b)
  return -1 if a.value < b.value
  return 1 if a.value > b.value
  0
end

# For complex comparisons
PseudoCompare(a, b)
  return -1 if a.value < b.value
  return 1 if a.value > b.value

  if a.value == b.value
    return -1 if a.attribute < b.attribute
    return 1 if a.attribute > b.attribute
    0
  end
end

Beware that the complex comparisons can be implemented in different ways. For instance, another pseudo code for the example above could be:

PseudoCompare(a, b)
  return -1 if a.value < b.value
  return 1 if a.value > b.value
  return -1 if a.attribute < b.attribute
  return 1 if a.attribute > b.attribute
  0
end

The second pseudo code is less complex since it doesn’t have to check for a.value == b.value. That check gets implicitly passed by failing the first two: a.value < b.value and a.value > b.value.

That might not be a huge improvement for an algorithm like this but for many others a simple refactoring could mean a great deal. Let us just say that this is where algorithms and complexity theory become relevant. Just beware of premature optimization and code extraction.

Sorting Custom Objects in Ruby

In order to be able to sort our custom objects using Ruby built-in sorting algorithms our objects must have their own implementation of the spaceship operator; #<=>(other). Then its only a matter of mixing in the Comparable module.

Searching for Existence/Location

Searching, just like sorting, is a complex subject. Usually we should use the searching capabilities built into the language of our choice. If circumstances require us to implement our own algorithm consider Big O notation and other methods to check on your implementation’s efficiency. Here are two different algorithms for searching for an object in an array.

In this searching approach we begin with the first element of an array and check every single value for the one we are looking for. Linear searching becomes slower, in a steady fashion, as the number of items in an array grow. Here’s a lovely gif showing linear search.

Binary here simply means two pieces. It describes the way this algorithm works. First, a binary search finds the mid point of the array. Then it compares the object in the mid element against the given object. If they are not the same, the algorithm continues searching to the right or left of the mid element depending on whether the given object is higher or lower than it. This algorithm repeats those same steps over and over again until either it finds the object or not.

For instance, granted we are looking for the object 31 within an array. First, we determine the mid of the array.

After comparing the mid point to 31 we discard the lower half of the array.

We determine the mid point of our smaller array. We repeat until we find the 31.

Due to its mechanics this type of searches only works on ordered arrays.

This type of algorithm is also known as:

  • Half-interval search
  • Divide-and-conquer algorithm

Just like there are many ways to search for objects in an array. Those different algorithms could be used for several different data structures. Hence, what might work for one data structure might not be the best for another.

Arrays in Ruby

As we’ll see later, arrays in Ruby behave like arrays, stacks, and queues, at the same time by implementing their basic behavior. For instance, since one of the first features added to arrays is the ability to dynamically add or remove elements while the program is running Ruby implements that behavior for us. Another such feature is size mutability.

Due to Ruby’s dynamic nature, Ruby arrays are a good fit for jagged arrays of small dimensions. For larger ones, though, we should consider creating traditional arrays for each type of dimension.

While Ruby arrays are flexible keep in mind that flexibility adds overhead; it’s achieved at the expense of memory and performance. That might not be a problem for an array with a handful of items. But as its size increases and/or it becomes more volatile it can make a big difference.

Array Actions in Ruby

The following method calls are mere examples of how we can perform traditional array actions onto ruby arrays.

Append: #push(object) Insert: #insert(index, object) Remove: #pop the last element or delete_at(index) Sort: #sort Search: #index(object) Binary Search: #bsearch

Before We Go

We covered a lot of concepts regarding traditional arrays. We also manage to describe a few searching algorithms and mentioned their importance. Finally, we talked a bit about ruby arrays.

Next time we’ll talk about lists, stacks, queues, and others.