code_aks
code_aks

Reputation: 2074

Slice into chunks from arranged hash in ruby

I have hash which keys are in sorted order and hash size is more than 1000. How can I divide hash in chunks based on range.

Example :-

h_main = {"1" => "a", "2" => "b", "9" => "c", ..............  "880" => "xx", "996" => "xyz", "998" => "lll", "1050" => "mnx"}

I have to divide above hash into sorter hash chunks based on range :-

h_result = {"1-100" => {"1" => "a", "2" => "b", "9" => "c" ..... "99" => "re"},
            "101-200" => {}
           ....
           ....

           "900-1000" => {"996" => "xyz", "998" => "lll"},
           "1000-1100" => {"1050" => "mnx"}
           }

I can do by applying each loop and then can add condition to merge key-value pair in respective hash but that's lengthy process.

Please help to provide optimize solution thanks in advance.

Upvotes: 3

Views: 698

Answers (3)

Cary Swoveland
Cary Swoveland

Reputation: 110675

def doit(h, group_size)
  h.keys.
    slice_when { |k1,k2| k2.to_i/group_size > k1.to_i/group_size }.
    each_with_object({}) do |key_group,g|
      start_range = group_size * (key_group.first.to_i/group_size) 
      g["%d-%d" % [start_range, start_range+group_size-1]] = h.slice(*key_group)
    end
end
h = {"11"=>"a", "12"=>"b", "19"=>"c", "28"=>"xx", "29"=> "xyz",
     "42"=>"lll", "47"=>"mnx"}
doit(h, 10)
  #=> {"10-19"=>{"11"=>"a", "12"=>"b", "19"=>"c"},
  #    "20-29"=>{"28"=>"xx", "29"=>"xyz"},
  #    "40-49"=>{"42"=>"lll", "47"=>"mnx"}} 
doit(h, 15)
  #=> {"0-14"=>{"11"=>"a", "12"=>"b"},
  #    "15-29"=>{"19"=>"c", "28"=>"xx", "29"=>"xyz"},
  #    "30-44"=>{"42"=>"lll"}, "45-59"=>{"47"=>"mnx"}} 
doit(h, 20)
  #=> {"0-19"=>{"11"=>"a", "12"=>"b", "19"=>"c"},
  #    "20-39"=>{"28"=>"xx", "29"=>"xyz"},
  #    "40-59"=>{"42"=>"lll", "47"=>"mnx"}} 

See Enumerable#slice_when and Hash#slice.

The steps are as follows.

group_size = 10
a = h.keys
  #=> ["11", "12", "19", "28", "29", "42", "47", "74", "76"] 
b = a.slice_when { |k1,k2| k2.to_i/group_size > k1.to_i/group_size }
  #=> #<Enumerator: #<Enumerator::Generator:0x000056fa312199b8>:each>

We can see the elements that will be generated by this enumerator and passed to the block by converting it to an array.

b.to_a
  #=> [["11", "12", "19"], ["28", "29"], ["42", "47"]]

Lastly,

b.each_with_object({}) do |key_group,g|
  start_range = group_size * (key_group.first.to_i/group_size) 
  g["%d-%d" % [start_range, start_range+group_size-1]] =
    h.slice(*key_group)
end
  #=> {"10-19"=>{"11"=>"a", "12"=>"b", "19"=>"c"},
  #    "20-29"=>{"28"=>"xx", "29"=>"xyz"},
  #    "40-49"=>{"42"=>"lll", "47"=>"mnx"}} 

Note that:

  e = b.each_with_object({})
    #=> #<Enumerator: #<Enumerator:
    #     #<Enumerator::Generator:0x0000560a0fc12658>:each>:
    #     each_with_object({})> 
  e.to_a
    #=> [[["11", "12", "19"], {}], [["28", "29"], {}], [["42", "47"], {}]]

The last step begins by the enumerator e generating a value and passing it to the block, after which the block variables are assigned values using array decomposition.

key_group,g = e.next
  #=> [["11", "12", "19"], {}] 
key_group
  #=> ["11", "12", "19"] 
g #=> {} 

The block calculations are then performed.

start_range = group_size * (key_group.first.to_i/group_size)
  #=> 10 * (11/10) => 10
g["%d-%d" % [start_range, start_range+group_size-1]] =
  h.slice(*key_group)
  #=> g["%d-%d" % [10, 10+10-1]] = h.slice("11", "12", "19")
  #=> g["10-19"] = {"11"=>"a", "12"=>"b", "19"=>"c"}
  #=> {"11"=>"a", "12"=>"b", "19"=>"c"} 

Now,

g #=> {"10-19"=>{"11"=>"a", "12"=>"b", "19"=>"c"}}  

The enumerator e then generates another element, passes it to the block and the block variables are assigned.

key_group,g = e.next
  #=> [["28", "29"], {"10-19"=>{"11"=>"a", "12"=>"b", "19"=>"c"}}] 
key_group
  #=> ["28", "29"] 
g #=> {"10-19"=>{"11"=>"a", "12"=>"b", "19"=>"c"}} 

Notice that the value of g has been updated. The block calculations now proceed as before, after which:

g #=> {"10-19"=>{"11"=>"a", "12"=>"b", "19"=>"c"},
  #    "20-29"=>{"28"=>"xx", "29"=>"xyz"}} 

Then

key_group,g = e.next
  #=> [["42", "47"], {"10-19"=>{"11"=>"a", "12"=>"b", "19"=>"c"},
  #                   "20-29"=>{"28"=>"xx", "29"=>"xyz"}}] 
key_group
  #=> ["42", "47"] 
g #=> {"10-19"=>{"11"=>"a", "12"=>"b", "19"=>"c"},
  #    "20-29"=>{"28"=>"xx", "29"=>"xyz"}}

After the the block calculations are performed:

g #=> {"10-19"=>{"11"=>"a", "12"=>"b", "19"=>"c"},
  #    "20-29"=>{"28"=>"xx", "29"=>"xyz"},
  #    "40-49"=>{"42"=>"lll", "47"=>"mnx"}} 

Then an exception is raised:

key_group,g = e.next
  #=> StopIteration (iteration reached an end)

causing the enumerator to return g.

Upvotes: 4

engineersmnky
engineersmnky

Reputation: 29318

Since your Hash is already sorted by the keys things like slice_when as proposed by @CarySwoveland would probably have an efficiency benefit; however were the Hash to be, or become, unsorted the following solutions would be unaffected as far as grouping goes.

Using a lambda to group the keys:

def group_numeric_range(h, group_size)
  groups = ->(n) do 
    g = n.to_i / group_size
    "#{g * group_size + 1}-#{g * group_size + group_size}"
  end 
  h.group_by do |k,_| 
    groups.(k)
  end.transform_values(&:to_h)
end

Example:

h = {"11"=>"a", "12"=>"b", "19"=>"c", "28"=>"xx", "29"=> "xyz",
     "42"=>"lll", "47"=>"mnx"}
group_numeric_range(h,10)
#=> {"11-20"=>{"11"=>"a", "12"=>"b", "19"=>"c"}, "21-30"=>{"28"=>"xx", "29"=>"xyz"}, "41-50"=>{"42"=>"lll", "47"=>"mnx"}}

Alternative:

def group_numeric_range(h, group_size)
  groups = ->(n) do 
    g =  n.to_i / group_size
    "#{g * group_size + 1}-#{g * group_size + group_size}"
  end 
  h.each_with_object(Hash.new{|h,k| h[k] = {}}) do |(k,v),obj| 
    obj[groups.(k)].merge!(k=>v)
  end
end

Update

Another option would be to build an Array of the groups and then select the index for grouping (I added outputting empty ranges too) e.g.

def group_numeric_range(h, group_size)
  groups = ((h.keys.max.to_i / group_size) + 1).times.map do |g|
    ["#{g * group_size + 1}-#{g * group_size + group_size}",{}]
  end
  h.each_with_object(groups) do |(k,v),obj| 
    obj[k.to_i / group_size].last.merge!(k=>v)
  end.to_h
end

h = {"11"=>"a", "12"=>"b", "19"=>"c", "28"=>"xx", "29"=> "xyz",
     "42"=>"lll", "47"=>"mnx"}
group_numeric_range(h,10)
#=> {"1-10"=>{}, "11-20"=>{"11"=>"a", "12"=>"b", "19"=>"c"}, "21-30"=>{"28"=>"xx", "29"=>"xyz"}, "31-40"=>{}, "41-50"=>{"42"=>"lll", "47"=>"mnx"}}

Upvotes: 4

benjessop
benjessop

Reputation: 1959

This is how I would do it, but unsure what you have done already.

Creating a large hash:

hash = {}
1000.times do |x|
 hash[x] = "hi!"
end

slicing by range:

hash.slice(*(1 .. 100))
=> # keys from 1 .. 100

producing desired hash:

def split_hash(range, hash)
  end_result = {}
  (hash.count / range).times do |x|
    range_start = (range * x) + 1
    range_end = range_start + range
    end_result["#{range_start}-#{range_end}"] = hash.slice(*(range_start .. range_end)) # slice returns a hash which was desired. If you can convert to an array you gain range access as slice still iterates but is performative. if you are OK with an array: hash.to_a[range_start .. range_end]
  end
  end_result
end

Upvotes: 1

Related Questions