Reputation: 155

Using group_by for only certain attributes

I have an array roads of objects that have many attributes. What I would like to do is find which district_code (one attribute) has more than one state (another attribute)

Ommitting the other attributes for simplicity - eg:

roads = [['1004', 'VIC'], ['1004', 'BRI'], ['1011', 'VIC'], ['1010', 'ACT'], ['1000', 'ACT'], ['1019', 'VIC'], ['1004', 'VIC']]

If I was to use roads.group_by { |ro| ro[0] } I get the result:

=> {"1004"=>[["1004", "VIC"], ["1004", "BRI"], ["1004", "VIC"]], "1011"=>[["1011", "VIC"]], "1010"=>[["1010", "ACT"]], "1000"=>[["1000", "ACT"]], "1019"=>[["1019", "VIC"]]}

What I want is the hash to only show where there has been more than one unique value for state, like so:

=> {"1004"=>["VIC", "BRI"]}

Any ideas on how to group_by or map by the number of values / or for a specific attribute within a value?

Thanks!

Upvotes: 0

Answers (3)

Cary Swoveland

Reputation: 110755

Let me add an additional element to roads so that the desired hash will have more than one key.

roads = [['1004', 'VIC'], ['1004', 'BRI'], ['1011', 'VIC'],
         ['1010', 'ACT'], ['1000', 'ACT'], ['1019', 'VIC'],
         ['1004', 'VIC'], ['1000', 'VIC]]

require 'set'

roads.each_with_object({}) do |(dc,st),h|     
  h[dc] = Set.new unless h.key?(dc)
  h[dc] << st
end.select {|k,v| v.size > 1}
   .transform_values(&:to_a)
  #=> {"1004"=>["VIC", "BRI"], "1000"=>["ACT", "VIC"]}

See Hash#select and Hash#transform_values.

Let's go through the steps.

f = roads.each_with_object({}) do |(dc,st),h|     
  h[dc] = Set.new unless h.key?(dc)
  h[dc] << st
end
  #=> {"1004"=>#<Set: {"VIC", "BRI"}>, "1011"=>#<Set: {"VIC"}>,
  #    "1010"=>#<Set: {"ACT"}>, "1000"=>#<Set: {"ACT", "VIC"}>,
  #    "1019"=>#<Set: {"VIC"}>}

Writing the block variables as |(dc,st),h| makes use of array decomposition. This is a powerful technique with many applications.

Here, suppose we write:

enum = roads.each_with_object({})
  #=> #<Enumerator: [["1004", "VIC"], ["1004", "BRI"],...,
  #                  ["1000", "VIC"]]:each_with_object({})>

When the first element is generated and passed to the block the following is executed to assign values to the block variables.

(dc,st),h = enum.next
  #=> [["1004", "VIC"], {}]

resulting in:

dc #=> "1004"
st #=>"VIC"
h  #=> {}

See Enumerator#next. This article is one of several that provides a fuller explanation of array decomposition.

Continuing,

g = f.select {|k,v| v.size > 1}
  #=> {"1004"=>#<Set: {"VIC", "BRI"}>, "1000"=>#<Set: {"ACT", "VIC"}>}
g.transform_values(&:to_a)
  #=> {"1004"=>["VIC", "BRI"], "1000"=>["ACT", "VIC"]}

There are two common variants to how f is computed. The first is

f = roads.each_with_object({}) do |(dc,st),h|
  (h[dc] ||= Set.new) << st
end

h[dc] ||= Set.new expands to h[dc] = h[dc] || Set.new. If h has a key dc this becomes h[dc] = h[dc]; else it becomes h[dc] = nil || Set.new = Set.new. A set s is returned and s << st is then executed. This works so long as h has no values equal to nil.

The second variant is

roads.each_with_object(Hash.new {|h,k| h[k]=Set.new}) do |(dc,st),h|
  h[dc] << st
end

This uses the form of Hash::new that takes a block and no argument. If h is defined

h = Hash.new {|h,k| h[k]=Set.new}

then, possibly after key-value pairs have been added, if h does not have a key k

h[k] = Set.new

is executed. In this context when h[dc] << st is encountered and h does not have a key dc, h[k] = Set.new is executed before h[dc] << st.

A variant of the initial solution is the following.

roads.each_with_object(Hash.new {|h,k| h[k]=[]}) {|(dc,st),h| h[dc] << st}
     .transform_values(&:uniq)
     .select {|k,v| v.size > 1}

Upvotes: 3

Rajagopalan

Reputation: 6064

Input

roads = [['1004', 'VIC'], ['1004', 'BRI'], ['1011', 'VIC'], ['1010', 'ACT'], ['1000', 'ACT'], ['1019', 'VIC'], ['1004', 'VIC']]

Code

p Hash[roads.group_by(&:first)
            .transform_values(&:uniq)
            .filter_map { |k, v| [k, v.map(&:last)] if v.length > 1 }]

Output

{"1004"=>["VIC", "BRI"]}

Upvotes: 2

Chris

Reputation: 36680

If you already can get:

{"1004"=>[["1004", "VIC"], ["1004", "BRI"], ["1004", "VIC"]], "1011"=>[["1011", "VIC"]], "1010"=>[["1010", "ACT"]], "1000"=>[["1000", "ACT"]], "1019"=>[["1019", "VIC"]]}

With:

roads.group_by { |ro| ro[0] }

Then you just need to select the entries with length greater than 1.

roads.group_by { |ro| ro[0] }.select { |k, v| v.length > 1 }

And I get:

{"1004"=>[["1004", "VIC"], ["1004", "BRI"], ["1004", "VIC"]]}

Then we can map that down to just the names. Could be one line, but split up for demonstration.

roads.group_by { |r| r[0] }                  \
     .select { |k, v| v.length > 1 }         \
     .map { |k, v| [k, v.map { |x| x[1] }] } \
     .to_h

And the result is:

{"1004"=>["VIC", "BRI", "VIC"]}

Upvotes: 1

Using group_by for only certain attributes

Answers (3)

Related Questions