Reputation: 2074
I have hash which keys are in sorted order and hash size is more than 1000. How can I divide hash in chunks based on range.
Example :-
h_main = {"1" => "a", "2" => "b", "9" => "c", .............. "880" => "xx", "996" => "xyz", "998" => "lll", "1050" => "mnx"}
I have to divide above hash into sorter hash chunks based on range :-
h_result = {"1-100" => {"1" => "a", "2" => "b", "9" => "c" ..... "99" => "re"},
"101-200" => {}
....
....
"900-1000" => {"996" => "xyz", "998" => "lll"},
"1000-1100" => {"1050" => "mnx"}
}
I can do by applying each loop and then can add condition to merge key-value pair in respective hash but that's lengthy process.
Please help to provide optimize solution thanks in advance.
Upvotes: 3
Views: 698
Reputation: 110675
def doit(h, group_size)
h.keys.
slice_when { |k1,k2| k2.to_i/group_size > k1.to_i/group_size }.
each_with_object({}) do |key_group,g|
start_range = group_size * (key_group.first.to_i/group_size)
g["%d-%d" % [start_range, start_range+group_size-1]] = h.slice(*key_group)
end
end
h = {"11"=>"a", "12"=>"b", "19"=>"c", "28"=>"xx", "29"=> "xyz",
"42"=>"lll", "47"=>"mnx"}
doit(h, 10)
#=> {"10-19"=>{"11"=>"a", "12"=>"b", "19"=>"c"},
# "20-29"=>{"28"=>"xx", "29"=>"xyz"},
# "40-49"=>{"42"=>"lll", "47"=>"mnx"}}
doit(h, 15)
#=> {"0-14"=>{"11"=>"a", "12"=>"b"},
# "15-29"=>{"19"=>"c", "28"=>"xx", "29"=>"xyz"},
# "30-44"=>{"42"=>"lll"}, "45-59"=>{"47"=>"mnx"}}
doit(h, 20)
#=> {"0-19"=>{"11"=>"a", "12"=>"b", "19"=>"c"},
# "20-39"=>{"28"=>"xx", "29"=>"xyz"},
# "40-59"=>{"42"=>"lll", "47"=>"mnx"}}
See Enumerable#slice_when and Hash#slice.
The steps are as follows.
group_size = 10
a = h.keys
#=> ["11", "12", "19", "28", "29", "42", "47", "74", "76"]
b = a.slice_when { |k1,k2| k2.to_i/group_size > k1.to_i/group_size }
#=> #<Enumerator: #<Enumerator::Generator:0x000056fa312199b8>:each>
We can see the elements that will be generated by this enumerator and passed to the block by converting it to an array.
b.to_a
#=> [["11", "12", "19"], ["28", "29"], ["42", "47"]]
Lastly,
b.each_with_object({}) do |key_group,g|
start_range = group_size * (key_group.first.to_i/group_size)
g["%d-%d" % [start_range, start_range+group_size-1]] =
h.slice(*key_group)
end
#=> {"10-19"=>{"11"=>"a", "12"=>"b", "19"=>"c"},
# "20-29"=>{"28"=>"xx", "29"=>"xyz"},
# "40-49"=>{"42"=>"lll", "47"=>"mnx"}}
Note that:
e = b.each_with_object({})
#=> #<Enumerator: #<Enumerator:
# #<Enumerator::Generator:0x0000560a0fc12658>:each>:
# each_with_object({})>
e.to_a
#=> [[["11", "12", "19"], {}], [["28", "29"], {}], [["42", "47"], {}]]
The last step begins by the enumerator e
generating a value and passing it to the block, after which the block variables are assigned values using array decomposition.
key_group,g = e.next
#=> [["11", "12", "19"], {}]
key_group
#=> ["11", "12", "19"]
g #=> {}
The block calculations are then performed.
start_range = group_size * (key_group.first.to_i/group_size)
#=> 10 * (11/10) => 10
g["%d-%d" % [start_range, start_range+group_size-1]] =
h.slice(*key_group)
#=> g["%d-%d" % [10, 10+10-1]] = h.slice("11", "12", "19")
#=> g["10-19"] = {"11"=>"a", "12"=>"b", "19"=>"c"}
#=> {"11"=>"a", "12"=>"b", "19"=>"c"}
Now,
g #=> {"10-19"=>{"11"=>"a", "12"=>"b", "19"=>"c"}}
The enumerator e
then generates another element, passes it to the block and the block variables are assigned.
key_group,g = e.next
#=> [["28", "29"], {"10-19"=>{"11"=>"a", "12"=>"b", "19"=>"c"}}]
key_group
#=> ["28", "29"]
g #=> {"10-19"=>{"11"=>"a", "12"=>"b", "19"=>"c"}}
Notice that the value of g
has been updated. The block calculations now proceed as before, after which:
g #=> {"10-19"=>{"11"=>"a", "12"=>"b", "19"=>"c"},
# "20-29"=>{"28"=>"xx", "29"=>"xyz"}}
Then
key_group,g = e.next
#=> [["42", "47"], {"10-19"=>{"11"=>"a", "12"=>"b", "19"=>"c"},
# "20-29"=>{"28"=>"xx", "29"=>"xyz"}}]
key_group
#=> ["42", "47"]
g #=> {"10-19"=>{"11"=>"a", "12"=>"b", "19"=>"c"},
# "20-29"=>{"28"=>"xx", "29"=>"xyz"}}
After the the block calculations are performed:
g #=> {"10-19"=>{"11"=>"a", "12"=>"b", "19"=>"c"},
# "20-29"=>{"28"=>"xx", "29"=>"xyz"},
# "40-49"=>{"42"=>"lll", "47"=>"mnx"}}
Then an exception is raised:
key_group,g = e.next
#=> StopIteration (iteration reached an end)
causing the enumerator to return g
.
Upvotes: 4
Reputation: 29318
Since your Hash is already sorted by the keys things like slice_when
as proposed by @CarySwoveland would probably have an efficiency benefit; however were the Hash to be, or become, unsorted the following solutions would be unaffected as far as grouping goes.
Using a lambda to group the keys:
def group_numeric_range(h, group_size)
groups = ->(n) do
g = n.to_i / group_size
"#{g * group_size + 1}-#{g * group_size + group_size}"
end
h.group_by do |k,_|
groups.(k)
end.transform_values(&:to_h)
end
Example:
h = {"11"=>"a", "12"=>"b", "19"=>"c", "28"=>"xx", "29"=> "xyz",
"42"=>"lll", "47"=>"mnx"}
group_numeric_range(h,10)
#=> {"11-20"=>{"11"=>"a", "12"=>"b", "19"=>"c"}, "21-30"=>{"28"=>"xx", "29"=>"xyz"}, "41-50"=>{"42"=>"lll", "47"=>"mnx"}}
Alternative:
def group_numeric_range(h, group_size)
groups = ->(n) do
g = n.to_i / group_size
"#{g * group_size + 1}-#{g * group_size + group_size}"
end
h.each_with_object(Hash.new{|h,k| h[k] = {}}) do |(k,v),obj|
obj[groups.(k)].merge!(k=>v)
end
end
Update
Another option would be to build an Array
of the groups and then select the index for grouping (I added outputting empty ranges too) e.g.
def group_numeric_range(h, group_size)
groups = ((h.keys.max.to_i / group_size) + 1).times.map do |g|
["#{g * group_size + 1}-#{g * group_size + group_size}",{}]
end
h.each_with_object(groups) do |(k,v),obj|
obj[k.to_i / group_size].last.merge!(k=>v)
end.to_h
end
h = {"11"=>"a", "12"=>"b", "19"=>"c", "28"=>"xx", "29"=> "xyz",
"42"=>"lll", "47"=>"mnx"}
group_numeric_range(h,10)
#=> {"1-10"=>{}, "11-20"=>{"11"=>"a", "12"=>"b", "19"=>"c"}, "21-30"=>{"28"=>"xx", "29"=>"xyz"}, "31-40"=>{}, "41-50"=>{"42"=>"lll", "47"=>"mnx"}}
Upvotes: 4
Reputation: 1959
This is how I would do it, but unsure what you have done already.
Creating a large hash:
hash = {}
1000.times do |x|
hash[x] = "hi!"
end
slicing by range:
hash.slice(*(1 .. 100))
=> # keys from 1 .. 100
producing desired hash:
def split_hash(range, hash)
end_result = {}
(hash.count / range).times do |x|
range_start = (range * x) + 1
range_end = range_start + range
end_result["#{range_start}-#{range_end}"] = hash.slice(*(range_start .. range_end)) # slice returns a hash which was desired. If you can convert to an array you gain range access as slice still iterates but is performative. if you are OK with an array: hash.to_a[range_start .. range_end]
end
end_result
end
Upvotes: 1