user1762229
user1762229

Reputation: 71

ruby syntax code involving hashes

I was looking at code regarding how to return a mode from an array and I ran into this code:

def mode(array)

    answer = array.inject ({}) { |k, v| k[v]=array.count(v);k}

    answer.select { |k,v| v == answer.values.max}.keys


end

I'm trying to conceptualize what the syntax means behind it as I am fairly new to Ruby and don't exactly understand how hashes are being used here. Any help would be greatly appreciated.

Upvotes: 0

Views: 73

Answers (3)

Cary Swoveland
Cary Swoveland

Reputation: 110645

I believe your question has been answered, and @Mark mentioned different ways to do the calculations. I would like to just focus on other ways to improve the first line of code:

answer = array.inject ({}) { |k, v| k[v] = array.count(v); k }

First, let's create some data:

array = [1,2,1,4,3,2,1]

Use each_with_object instead of inject

My suspicion is that the code might be fairly old, as Enumerable#each_with_object, which was introduced in v. 1.9, is arguably a better choice here than Enumerable#inject (aka reduce). If we were to use each_with_object, the first line would be:

answer = array.each_with_object ({}) { |v,k| k[v] = array.count(v) }
  #=> {1=>3, 2=>2, 4=>1, 3=>1}

each_with_object returns the object, a hash held by the block variable v.

As you see, each_with_object is very similar to inject, the only differences being:

  • it is not necessary to return v from the block to each_with_object, as it is with inject (the reason for that annoying ; v at the end of inject's block);
  • the block variable for the object (k) follows v with each_with_object, whereas it proceeds v with inject; and
  • when not given a block, each_with_object returns an enumerator, meaning it can be chained to other other methods (e.g., arr.each_with_object.with_index ....

Don't get me wrong, inject remains an extremely powerful method, and in many situations it has no peer.

Two more improvements

In addition to replacing inject with each_with_object, let me make two other changes:

answer = array.uniq.each_with_object ({}) { |k,h| h[k] = array.count(k) }
  #=> {1=>3, 2=>2, 4=>1, 3=>1} 

In the original expression, the object returned by inject (sometimes called the "memo") was represented by the block variable k, which I am using to represent a hash key ("k" for "key"). Simlarly, as the object is a hash, I chose to use h for its block variable. Like many others, I prefer to keep the block variables short and use names that indicate object type (e.g., a for array, h for hash, s for string, sym for symbol, and so on).

Now suppose:

array = [1,1]

then inject would pass the first 1 into the block and then compute k[1] = array.count(1) #=> 2, so the hash k returned to inject would be {1=>2}. It would then pass the second 1 into the block, again compute k[1] = array.count(1) #=> 2, overwriting 1=>1 in k with 1=>1; that is, not changing it at all. Doesn't it make more sense to just do this for the unique values of array? That's why I have: array.uniq....

Even better: use a counting hash

This is still quite inefficient--all those counts. Here's a way that reads better and is probably more efficient:

array.each_with_object(Hash.new(0)) { |k,h| h[k] += 1 }
  #=> {1=>3, 2=>2, 4=>1, 3=>1} 

Let's have a look at this in gory detail. Firstly, the docs for Hash#new read, "If obj is specified [i.e., Hash.new(obj)], this single object will be used for all default values." This means that if:

h = Hash.new('cat')

and h does not have a key dog, then:

h['dog'] #=> 'cat'

Important: The last expression is often misunderstood. It merely returns the default value. str = "It does *not* add the key-value pair 'dog'=>'cat' to the hash." Let me repeat that: puts str.

Now let's see what's happening here:

enum = array.each_with_object(Hash.new(0))
  #=> #<Enumerator: [1, 2, 1, 4, 3, 2, 1]:each_with_object({})> 

We can see the contents of the enumerator by converting it to an array:

enum.to_a
  #=> [[1, {}], [2, {}], [1, {}], [4, {}], [3, {}], [2, {}], [1, {}]] 

These seven elements are passed into the block by the method each:

enum.each { |k,h| h[k] += 1 }
  => {1=>3, 2=>2, 4=>1, 3=>1}

Pretty cool, eh?

We can simulate this using Enumerator#next. The first value of enum ([1, {}]) is passed to the block and assigned to the block variables:

k,h = enum.next
  #=> [1, {}] 
k #=> 1 
h #=> {} 

and we compute:

h[k] += 1
  #=> h[k] = h[k] + 1  (what '+=' means)
  #        = 0 + 1 = 1 (h[k] on the right equals the default value
  #                     of 1 since `h` has no key `k`) 

so now:

h #=> {1=>1}

Next, each passes the second value of enum into the block and similar calculations are performed:

k,h = enum.next
  #=> [2, {1=>1}] 
k #=> 2 
h #=> {1=>1} 
h[k] += 1
  #=> 1 
h #=> {1=>1, 2=>1} 

Things are a little different when the third element of enum is passed in, because h now has a key 1:

k,h = enum.next
  #=> [1, {1=>1, 2=>1}] 
k #=> 1 
h #=> {1=>1, 2=>1} 
h[k] += 1
  #=> h[k] = h[k] + 1
  #=> h[1] = h[1] + 1  
  #=> h[1] = 1 + 1 => 2
h #=> {1=>1, 2=>1} 

The remaining calculations are performed similarly.

Upvotes: 0

Mark Thomas
Mark Thomas

Reputation: 37507

Line by line:

answer = array.inject ({}) { |k, v| k[v]=array.count(v);k}

This assembles a hash of counts. I would not have called the variable answer because it is not the answer, it is an intermediary step. The inject() method (also known as reduce()) allows you to iterate over a collection, keeping an accumulator (e.g. a running total or in this case a hash collecting counts). It needs a starting value of {} so that the hash exists when attempting to store a value. Given the array [1,2,2,2,3,4,5,6,6] the counts would look like this: {1=>1, 2=>3, 3=>1, 4=>1, 5=>1, 6=>2}.

answer.select { |k,v| v == answer.values.max}.keys

This selects all elements in the above hash whose value is equal to the maximum value, in other words the highest. Then it identifies the keys associated with the maximum values. Note that it will list multiple values if they share the maximum value.

An alternative:

If you didn't care about returning multiple, you could use group_by as follows:

array.group_by{|x|x}.values.max_by(&:size).first

or, in Ruby 2.2+:

array.group_by{&:itself}.values.max_by(&:size).first

Upvotes: 2

itdoesntwork
itdoesntwork

Reputation: 4792

The inject method acts like an accumulator. Here is a simpler example:

sum = [1,2,3].inject(0) { |current_tally, new_value| current_tally + new_value }

The 0 is the starting point.

So after the first line, we have a hash that maps each number to the number of times it appears.

The mode calls for the most frequent element, and that is what the next line does: selects only those who are equal to the maximum.

Upvotes: 0

Related Questions