Phrogz
Phrogz

Reputation: 303178

How to merge array of hashes to get hash of arrays of values

This is the opposite of Turning a Hash of Arrays into an Array of Hashes in Ruby.

Elegantly and/or efficiently turn an array of hashes into a hash where the values are arrays of all values:

hs = [
  { a:1, b:2 },
  { a:3, c:4 },
  { b:5, d:6 }
]
collect_values( hs )
#=> { :a=>[1,3], :b=>[2,5], :c=>[4], :d=>[6] }

This terse code almost works, but fails to create an array when there are no duplicates:

def collect_values( hashes )
  hashes.inject({}){ |a,b| a.merge(b){ |_,x,y| [*x,*y] } }
end
collect_values( hs )
#=> { :a=>[1,3], :b=>[2,5], :c=>4, :d=>6 }

This code works, but can you write a better version?

def collect_values( hashes )
  # Requires Ruby 1.8.7+ for Object#tap
  Hash.new{ |h,k| h[k]=[] }.tap do |result|
    hashes.each{ |h| h.each{ |k,v| result[k]<<v } }
  end
end

Solutions that only work in Ruby 1.9 are acceptable, but should be noted as such.


Here are the results of benchmarking the various answers below (and a few more of my own), using three different arrays of hashes:

               user     system      total        real
Phrogz 2a  0.577000   0.000000   0.577000 (  0.576000)
Phrogz 2b  0.624000   0.000000   0.624000 (  0.620000)
Glenn 1    0.640000   0.000000   0.640000 (  0.641000)
Phrogz 1   0.671000   0.000000   0.671000 (  0.668000)
Michael 1  0.702000   0.000000   0.702000 (  0.700000)
Michael 2  0.717000   0.000000   0.717000 (  0.726000)
Glenn 2    0.765000   0.000000   0.765000 (  0.764000)
fl00r      0.827000   0.000000   0.827000 (  0.836000)
sawa       0.874000   0.000000   0.874000 (  0.868000)
Tokland 1  0.873000   0.000000   0.873000 (  0.876000)
Tokland 2  1.077000   0.000000   1.077000 (  1.073000)
Phrogz 3   2.106000   0.093000   2.199000 (  2.209000)

The fastest code is this method that I added:

def collect_values(hashes)
  {}.tap{ |r| hashes.each{ |h| h.each{ |k,v| (r[k]||=[]) << v } } }
end

I've accepted "glenn mcdonald's answer" as it was competitive in terms of speed, reasonably terse, but (most importantly) because it pointed out the danger of using a Hash with a self-modifying default proc for convenient construction, as this may introduce bad changes when the user is indexing it later on.

Finally, here's the benchmark code, in case you want to run your own comparisons:

require 'prime'   # To generate the third hash
require 'facets'  # For tokland1's map_by
AZSYMBOLS = (:a..:z).to_a
TESTS = {
  '26 Distinct Hashes'   => AZSYMBOLS.zip(1..26).map{|a| Hash[*a] },
  '26 Same-Key Hashes'   => ([:a]*26).zip(1..26).map{|a| Hash[*a] },
  '26 Mixed-Keys Hashes' => (2..27).map do |i|
    factors = i.prime_division.transpose
    Hash[AZSYMBOLS.values_at(*factors.first).zip(factors.last)]
  end
}

def phrogz1(hashes)
  Hash.new{ |h,k| h[k]=[] }.tap do |result|
    hashes.each{ |h| h.each{ |k,v| result[k]<<v } }
  end
end
def phrogz2a(hashes)
  {}.tap{ |r| hashes.each{ |h| h.each{ |k,v| (r[k]||=[]) << v } } }
end
def phrogz2b(hashes)
  hashes.each_with_object({}){ |h,r| h.each{ |k,v| (r[k]||=[]) << v } }
end
def phrogz3(hashes)
  result = hashes.inject({}){ |a,b| a.merge(b){ |_,x,y| [*x,*y] } }
  result.each{ |k,v| result[k] = [v] unless v.is_a? Array }
end
def glenn1(hs)
  hs.reduce({}) {|h,pairs| pairs.each {|k,v| (h[k] ||= []) << v}; h}
end
def glenn2(hs)
  hs.map(&:to_a).flatten(1).reduce({}) {|h,(k,v)| (h[k] ||= []) << v; h}
end
def fl00r(hs)
  h = Hash.new{|h,k| h[k]=[]}
  hs.map(&:to_a).flatten(1).each{|v| h[v[0]] << v[1]}
  h
end
def sawa(a)
  a.map(&:to_a).flatten(1).group_by{|k,v| k}.each_value{|v| v.map!{|k,v| v}}
end
def michael1(hashes)
  h = Hash.new{|h,k| h[k]=[]}
  hashes.each_with_object(h) do |h, result|
    h.each{ |k, v| result[k] << v }
  end
end
def michael2(hashes)
  h = Hash.new{|h,k| h[k]=[]}
  hashes.inject(h) do |result, h|
    h.each{ |k, v| result[k] << v }
    result
  end
end
def tokland1(hs)
  hs.map(&:to_a).flatten(1).map_by{ |k, v| [k, v] }
end
def tokland2(hs)
  Hash[hs.map(&:to_a).flatten(1).group_by(&:first).map{ |k, vs|
    [k, vs.map{|o|o[1]}]
  }]
end

require 'benchmark'
N = 10_000
Benchmark.bm do |x|
  x.report('Phrogz 2a'){ TESTS.each{ |n,h| N.times{ phrogz2a(h) } } }
  x.report('Phrogz 2b'){ TESTS.each{ |n,h| N.times{ phrogz2b(h) } } }
  x.report('Glenn 1  '){ TESTS.each{ |n,h| N.times{ glenn1(h)   } } }
  x.report('Phrogz 1 '){ TESTS.each{ |n,h| N.times{ phrogz1(h)  } } }
  x.report('Michael 1'){ TESTS.each{ |n,h| N.times{ michael1(h) } } }
  x.report('Michael 2'){ TESTS.each{ |n,h| N.times{ michael2(h) } } }
  x.report('Glenn 2  '){ TESTS.each{ |n,h| N.times{ glenn2(h)   } } }
  x.report('fl00r    '){ TESTS.each{ |n,h| N.times{ fl00r(h)    } } }
  x.report('sawa     '){ TESTS.each{ |n,h| N.times{ sawa(h)     } } }
  x.report('Tokland 1'){ TESTS.each{ |n,h| N.times{ tokland1(h) } } }
  x.report('Tokland 2'){ TESTS.each{ |n,h| N.times{ tokland2(h) } } }
  x.report('Phrogz 3 '){ TESTS.each{ |n,h| N.times{ phrogz3(h)  } } }

end

Upvotes: 48

Views: 22923

Answers (8)

Muneiah
Muneiah

Reputation: 1

[{'a' => 1}, {'b' => 2}, {'c' => 3}].reduce Hash.new, :merge

Upvotes: -2

Ich
Ich

Reputation: 1378

What about this one?

hs.reduce({}, :merge)

shortest! But performance is pretty bad:

       user     system      total        real
 Phrogz 2a  0.240000   0.010000   0.250000 (  0.247337)
 Phrogz 2b  0.280000   0.000000   0.280000 (  0.274985)
 Glenn 1    0.290000   0.000000   0.290000 (  0.290370)
 Phrogz 1   0.310000   0.000000   0.310000 (  0.315548)
 Michael 1  0.360000   0.000000   0.360000 (  0.356760)
 Michael 2  0.360000   0.000000   0.360000 (  0.360119)
 Glenn 2    0.370000   0.000000   0.370000 (  0.369354)
 fl00r      0.390000   0.000000   0.390000 (  0.385883)
 sawa       0.410000   0.000000   0.410000 (  0.408190)
 Tokland 1  0.410000   0.000000   0.410000 (  0.410097)
 Tokland 2  0.490000   0.000000   0.490000 (  0.497325)
 Ich        1.410000   0.000000   1.410000 (  1.413176) # <<-- new
 Phrogz 3   1.760000   0.010000   1.770000 (  1.762979)

Upvotes: 0

Cary Swoveland
Cary Swoveland

Reputation: 110675

I thought it might be interesting to compare the winner:

def phrogz2a(hashes)
  {}.tap{ |r| hashes.each{ |h| h.each{ |k,v| (r[k]||=[]) << v } } }
end

with a slight variant:

def phrogz2ai(hashes)
  Hash.new {|h,k| h[k]=[]}.tap {|r| hashes.each {|h| h.each {|k,v| r[k] << v}}}
end

because one can often employ either approach (typically to create an empty array or hash).

Using Phrogz's benchmark code, here's how they compare here:

            user     system      total        real
Phrogz 2a   0.440000   0.010000   0.450000 (  0.444435)
Phrogz 2ai  0.580000   0.010000   0.590000 (  0.580248)

Upvotes: 1

tokland
tokland

Reputation: 67850

Facet's Enumerable#map_by comes in handy for these cases. This implementation will be no doubt slower than others, but modular and compact code is always easier to maintain:

require 'facets'
hs.flat_map(&:to_a).map_by { |k, v| [k, v] }
#=> {:b=>[2, 5], :d=>[6], :c=>[4], :a=>[1, 3]

Upvotes: 1

fl00r
fl00r

Reputation: 83680

h = Hash.new{|h,k| h[k]=[]}
hs.map(&:to_a).flatten(1).each{|v| h[v[0]] << v[1]}

Upvotes: 3

sawa
sawa

Reputation: 168081

Same with some other answers using map(&:to_a).flatten(1). The problem is how to modify the values of the hash. I used the fact that arrays are mutable.

def collect_values a
  a.map(&:to_a).flatten(1).group_by{|k, v| k}.
  each_value{|v| v.map!{|k, v| v}}
end

Upvotes: 1

glenn mcdonald
glenn mcdonald

Reputation: 15478

Take your pick:

hs.reduce({}) {|h,pairs| pairs.each {|k,v| (h[k] ||= []) << v}; h}

hs.map(&:to_a).flatten(1).reduce({}) {|h,(k,v)| (h[k] ||= []) << v; h}

I'm strongly against messing with the defaults for hashes, as the other suggestions do, because then checking for a value modifies the hash, which seems very wrong to me.

Upvotes: 24

Michael Kohl
Michael Kohl

Reputation: 66837

How's this?

def collect_values(hashes)
  h = Hash.new{|h,k| h[k]=[]}
  hashes.each_with_object(h) do |h, result|
    h.each{ |k, v| result[k] << v }
  end
end

Edit - Also possible with inject, but IMHO not as nice:

def collect_values( hashes )
  h = Hash.new{|h,k| h[k]=[]}
  hashes.inject(h) do |result, h|
    h.each{ |k, v| result[k] << v }
    result
  end
end

Upvotes: 1

Related Questions