Reputation: 155
I have an array roads
of objects that have many attributes. What I would like to do is find which district_code
(one attribute) has more than one state
(another attribute)
Ommitting the other attributes for simplicity - eg:
roads = [['1004', 'VIC'], ['1004', 'BRI'], ['1011', 'VIC'], ['1010', 'ACT'], ['1000', 'ACT'], ['1019', 'VIC'], ['1004', 'VIC']]
If I was to use roads.group_by { |ro| ro[0] }
I get the result:
=> {"1004"=>[["1004", "VIC"], ["1004", "BRI"], ["1004", "VIC"]], "1011"=>[["1011", "VIC"]], "1010"=>[["1010", "ACT"]], "1000"=>[["1000", "ACT"]], "1019"=>[["1019", "VIC"]]}
What I want is the hash to only show where there has been more than one unique value for state
, like so:
=> {"1004"=>["VIC", "BRI"]}
Any ideas on how to group_by or map by the number of values / or for a specific attribute within a value?
Thanks!
Upvotes: 0
Views: 296
Reputation: 110755
Let me add an additional element to roads
so that the desired hash will have more than one key.
roads = [['1004', 'VIC'], ['1004', 'BRI'], ['1011', 'VIC'],
['1010', 'ACT'], ['1000', 'ACT'], ['1019', 'VIC'],
['1004', 'VIC'], ['1000', 'VIC]]
require 'set'
roads.each_with_object({}) do |(dc,st),h|
h[dc] = Set.new unless h.key?(dc)
h[dc] << st
end.select {|k,v| v.size > 1}
.transform_values(&:to_a)
#=> {"1004"=>["VIC", "BRI"], "1000"=>["ACT", "VIC"]}
See Hash#select and Hash#transform_values.
Let's go through the steps.
f = roads.each_with_object({}) do |(dc,st),h|
h[dc] = Set.new unless h.key?(dc)
h[dc] << st
end
#=> {"1004"=>#<Set: {"VIC", "BRI"}>, "1011"=>#<Set: {"VIC"}>,
# "1010"=>#<Set: {"ACT"}>, "1000"=>#<Set: {"ACT", "VIC"}>,
# "1019"=>#<Set: {"VIC"}>}
Writing the block variables as |(dc,st),h|
makes use of array decomposition. This is a powerful technique with many applications.
Here, suppose we write:
enum = roads.each_with_object({})
#=> #<Enumerator: [["1004", "VIC"], ["1004", "BRI"],...,
# ["1000", "VIC"]]:each_with_object({})>
When the first element is generated and passed to the block the following is executed to assign values to the block variables.
(dc,st),h = enum.next
#=> [["1004", "VIC"], {}]
resulting in:
dc #=> "1004"
st #=>"VIC"
h #=> {}
See Enumerator#next. This article is one of several that provides a fuller explanation of array decomposition.
Continuing,
g = f.select {|k,v| v.size > 1}
#=> {"1004"=>#<Set: {"VIC", "BRI"}>, "1000"=>#<Set: {"ACT", "VIC"}>}
g.transform_values(&:to_a)
#=> {"1004"=>["VIC", "BRI"], "1000"=>["ACT", "VIC"]}
There are two common variants to how f
is computed. The first is
f = roads.each_with_object({}) do |(dc,st),h|
(h[dc] ||= Set.new) << st
end
h[dc] ||= Set.new
expands to h[dc] = h[dc] || Set.new
. If h
has a key dc
this becomes h[dc] = h[dc]
; else it becomes h[dc] = nil || Set.new = Set.new
. A set s
is returned and s << st
is then executed. This works so long as h
has no values equal to nil
.
The second variant is
roads.each_with_object(Hash.new {|h,k| h[k]=Set.new}) do |(dc,st),h|
h[dc] << st
end
This uses the form of Hash::new that takes a block and no argument. If h
is defined
h = Hash.new {|h,k| h[k]=Set.new}
then, possibly after key-value pairs have been added, if h
does not have a key k
h[k] = Set.new
is executed. In this context when h[dc] << st
is encountered and h
does not have a key dc
, h[k] = Set.new
is executed before h[dc] << st
.
A variant of the initial solution is the following.
roads.each_with_object(Hash.new {|h,k| h[k]=[]}) {|(dc,st),h| h[dc] << st}
.transform_values(&:uniq)
.select {|k,v| v.size > 1}
Upvotes: 3
Reputation: 6064
Input
roads = [['1004', 'VIC'], ['1004', 'BRI'], ['1011', 'VIC'], ['1010', 'ACT'], ['1000', 'ACT'], ['1019', 'VIC'], ['1004', 'VIC']]
Code
p Hash[roads.group_by(&:first)
.transform_values(&:uniq)
.filter_map { |k, v| [k, v.map(&:last)] if v.length > 1 }]
Output
{"1004"=>["VIC", "BRI"]}
Upvotes: 2
Reputation: 36680
If you already can get:
{"1004"=>[["1004", "VIC"], ["1004", "BRI"], ["1004", "VIC"]], "1011"=>[["1011", "VIC"]], "1010"=>[["1010", "ACT"]], "1000"=>[["1000", "ACT"]], "1019"=>[["1019", "VIC"]]}
With:
roads.group_by { |ro| ro[0] }
Then you just need to select the entries with length greater than 1.
roads.group_by { |ro| ro[0] }.select { |k, v| v.length > 1 }
And I get:
{"1004"=>[["1004", "VIC"], ["1004", "BRI"], ["1004", "VIC"]]}
Then we can map that down to just the names. Could be one line, but split up for demonstration.
roads.group_by { |r| r[0] } \
.select { |k, v| v.length > 1 } \
.map { |k, v| [k, v.map { |x| x[1] }] } \
.to_h
And the result is:
{"1004"=>["VIC", "BRI", "VIC"]}
Upvotes: 1