Reputation: 495
Is there a way to set the seed value for using the ruby hash function (i.e. murmur hash in 1.9, don't know JRuby?) so that I can get the same hash code every time I run the script (i.e. in parallel on multiple processes or on different nodes)
so that
puts "this is a test".hash
is the same whenever I run this , today, tomorrow, 3 weeks from now, etc
I want to do this so I can implement MinHash in parallel
I can see in the murmur_hash gem that the murmur hash accept a seed so I assume I can set the seed and get the hash code deterministically whenever I choose the same seed
Upvotes: 2
Views: 2464
Reputation: 5794
Reviving this if anyones wants to know...
You can use the murmurhash3
gem located here.
You can override the hash function built into String
class.
require 'murmurhash3'
class String
SEED = 12345678
def hash
MurmurHash3::V32.str_hash(self,SEED)
end
end
No you can use this hash function on any string.
$ irb
2.1.1 :001 > "this is a test".hash
=> 553036434
Assuming you use the same seed 12345678
, then you should repeatedly get the same hash on any server, process, thread.
You can parallel
gem located here
Then simply pass the list of items you want to be executed/hashed in parallel.
items_to_hash = ['val0', 'val1',...., 'valN']
results = Parallel.map(items_to_hash) do |item|
item.hash
end
If you not into using another gem to execute the hashes in parallel, then here is an example to use vanilla Ruby to get you going.
http://t-a-w.blogspot.com/2010/05/very-simple-parallelization-with-ruby.html
Upvotes: 1