Bigmac Bigmac
Bigmac Bigmac

Reputation: 19

Ruby. Slow hash value look up with large hash's

I'm working with a hash size of 10 million...

hash={'Aatater'=>2, 'Bbabber'=>3, 'Xaazerx'=>2, 'Caackersc'=>1}

searching for a Key is impressively quick. even if the key exist at the end of a hash...

hash['Caackersc']=>1

but if you search for a value that exist deep in the hash it will be painfully slow.

hash.key(1)=>"Caackersc"

So. My first attempt to achieve great speeds when searching for a value. I thought I would invert the hash. But this would cause duplicate values to be removed.

hash.invert{ 2=>'Xaazerx', 3=>'Bbabber', 1=>'Caackersc'}

So I then tried to invert the hash myself with out loosing data... by issuing a index number befor every new key.

I_hash=Hash.new

hash.to_a.each_with_index{|h,i| i_hash[[i.to_s,h[1].to_s]]=h[0]}

{["0","2"]=>'Aatater', ["1","3"]=>'Bbabber', ["2","2"]=>' Xaazerx', ["3","1"]=>'Caackersc'}

So. At this point I can search for the new key with the same wicked quick speeds.

i_hash[["1","3"]] => "Bbabber"

But now... I hope the index part of the key could be found with a regex?

I_hash[/\d/,"3"]=>fail, nil

so... this was my best attempt to speed up a value search but it will only work in my situation if I can regex the first array of the key.

Upvotes: 1

Views: 949

Answers (3)

Cary Swoveland
Cary Swoveland

Reputation: 110675

You could also compute the inverse like this:

hash.each_with_object({}) { |(k,v),h| (h[v] ||= []) << k }
  #=> {2=>["Aatater", "Xaazerx"], 3=>["Bbabber"], 1=>["Caackersc"]} 

Upvotes: 1

Jordan Running
Jordan Running

Reputation: 106027

You've got the right idea. For your inverted hash you want each value to be an array of the corresponding keys in the original hash. Something like this:

hash = { 'Aatater' => 2, 'Bbabber' => 3, 'Xaazerx' => 2, 'Caackersc' => 1 }

inv_hash = hash.keys.group_by {|k| hash[k] }

p inv_hash
# => { 2 => [ "Aatater", "Xaazerx" ],
#      3 => [ "Bbabber" ],
#      1 => [ "Caackersc" ] }

p inv_hash[2]
# => [ "Aatater", "Xaazerx" ]

Upvotes: 4

Damiano Stoffie
Damiano Stoffie

Reputation: 897

If you hit performance problem, you could consider using redis.

Upvotes: 0

Related Questions