Reputation: 71
I was looking at code regarding how to return a mode from an array and I ran into this code:
def mode(array)
answer = array.inject ({}) { |k, v| k[v]=array.count(v);k}
answer.select { |k,v| v == answer.values.max}.keys
end
I'm trying to conceptualize what the syntax means behind it as I am fairly new to Ruby and don't exactly understand how hashes are being used here. Any help would be greatly appreciated.
Upvotes: 0
Views: 73
Reputation: 110645
I believe your question has been answered, and @Mark mentioned different ways to do the calculations. I would like to just focus on other ways to improve the first line of code:
answer = array.inject ({}) { |k, v| k[v] = array.count(v); k }
First, let's create some data:
array = [1,2,1,4,3,2,1]
Use each_with_object
instead of inject
My suspicion is that the code might be fairly old, as Enumerable#each_with_object, which was introduced in v. 1.9, is arguably a better choice here than Enumerable#inject (aka reduce
). If we were to use each_with_object
, the first line would be:
answer = array.each_with_object ({}) { |v,k| k[v] = array.count(v) }
#=> {1=>3, 2=>2, 4=>1, 3=>1}
each_with_object
returns the object, a hash held by the block variable v
.
As you see, each_with_object
is very similar to inject
, the only differences being:
v
from the block to each_with_object
, as it is with inject
(the reason for that annoying ; v
at the end of inject
's block);k
) follows v
with each_with_object
, whereas it proceeds v
with inject
; andeach_with_object
returns an enumerator, meaning it can be chained to other other methods (e.g., arr.each_with_object.with_index ...
.Don't get me wrong, inject
remains an extremely powerful method, and in many situations it has no peer.
Two more improvements
In addition to replacing inject
with each_with_object
, let me make two other changes:
answer = array.uniq.each_with_object ({}) { |k,h| h[k] = array.count(k) }
#=> {1=>3, 2=>2, 4=>1, 3=>1}
In the original expression, the object returned by inject
(sometimes called the "memo") was represented by the block variable k
, which I am using to represent a hash key ("k" for "key"). Simlarly, as the object is a hash, I chose to use h
for its block variable. Like many others, I prefer to keep the block variables short and use names that indicate object type (e.g., a
for array, h
for hash, s
for string, sym
for symbol, and so on).
Now suppose:
array = [1,1]
then inject
would pass the first 1
into the block and then compute k[1] = array.count(1) #=> 2
, so the hash k
returned to inject
would be {1=>2}
. It would then pass the second 1
into the block, again compute k[1] = array.count(1) #=> 2
, overwriting 1=>1
in k
with 1=>1
; that is, not changing it at all. Doesn't it make more sense to just do this for the unique values of array
? That's why I have: array.uniq...
.
Even better: use a counting hash
This is still quite inefficient--all those counts
. Here's a way that reads better and is probably more efficient:
array.each_with_object(Hash.new(0)) { |k,h| h[k] += 1 }
#=> {1=>3, 2=>2, 4=>1, 3=>1}
Let's have a look at this in gory detail. Firstly, the docs for Hash#new read, "If obj
is specified [i.e., Hash.new(obj)
], this single object will be used for all default values." This means that if:
h = Hash.new('cat')
and h
does not have a key dog
, then:
h['dog'] #=> 'cat'
Important: The last expression is often misunderstood. It merely returns the default value. str = "It does *not* add the key-value pair 'dog'=>'cat' to the hash."
Let me repeat that: puts str
.
Now let's see what's happening here:
enum = array.each_with_object(Hash.new(0))
#=> #<Enumerator: [1, 2, 1, 4, 3, 2, 1]:each_with_object({})>
We can see the contents of the enumerator by converting it to an array:
enum.to_a
#=> [[1, {}], [2, {}], [1, {}], [4, {}], [3, {}], [2, {}], [1, {}]]
These seven elements are passed into the block by the method each
:
enum.each { |k,h| h[k] += 1 }
=> {1=>3, 2=>2, 4=>1, 3=>1}
Pretty cool, eh?
We can simulate this using Enumerator#next. The first value of enum
([1, {}]
) is passed to the block and assigned to the block variables:
k,h = enum.next
#=> [1, {}]
k #=> 1
h #=> {}
and we compute:
h[k] += 1
#=> h[k] = h[k] + 1 (what '+=' means)
# = 0 + 1 = 1 (h[k] on the right equals the default value
# of 1 since `h` has no key `k`)
so now:
h #=> {1=>1}
Next, each
passes the second value of enum
into the block and similar calculations are performed:
k,h = enum.next
#=> [2, {1=>1}]
k #=> 2
h #=> {1=>1}
h[k] += 1
#=> 1
h #=> {1=>1, 2=>1}
Things are a little different when the third element of enum
is passed in, because h
now has a key 1
:
k,h = enum.next
#=> [1, {1=>1, 2=>1}]
k #=> 1
h #=> {1=>1, 2=>1}
h[k] += 1
#=> h[k] = h[k] + 1
#=> h[1] = h[1] + 1
#=> h[1] = 1 + 1 => 2
h #=> {1=>1, 2=>1}
The remaining calculations are performed similarly.
Upvotes: 0
Reputation: 37507
Line by line:
answer = array.inject ({}) { |k, v| k[v]=array.count(v);k}
This assembles a hash of counts. I would not have called the variable answer
because it is not the answer, it is an intermediary step. The inject()
method (also known as reduce()
) allows you to iterate over a collection, keeping an accumulator (e.g. a running total or in this case a hash collecting counts). It needs a starting value of {}
so that the hash exists when attempting to store a value. Given the array [1,2,2,2,3,4,5,6,6]
the counts would look like this: {1=>1, 2=>3, 3=>1, 4=>1, 5=>1, 6=>2}
.
answer.select { |k,v| v == answer.values.max}.keys
This selects all elements in the above hash whose value is equal to the maximum value, in other words the highest. Then it identifies the keys
associated with the maximum values. Note that it will list multiple values if they share the maximum value.
An alternative:
If you didn't care about returning multiple, you could use group_by as follows:
array.group_by{|x|x}.values.max_by(&:size).first
or, in Ruby 2.2+:
array.group_by{&:itself}.values.max_by(&:size).first
Upvotes: 2
Reputation: 4792
The inject
method acts like an accumulator. Here is a simpler example:
sum = [1,2,3].inject(0) { |current_tally, new_value| current_tally + new_value }
The 0 is the starting point.
So after the first line, we have a hash that maps each number to the number of times it appears.
The mode calls for the most frequent element, and that is what the next line does: selects only those who are equal to the maximum.
Upvotes: 0