Reputation: 27583
For a caching layer, I need to create a unique sha for a hash. It should be unique for the content of that hash. Two hashes with the same config should have the same sha.
in_2014 = { scopes: [1, 2, 3], year: 2014 }
not_in_2104 = { scopes: [1, 2, 3], year: 2015 }
also_in_2014 = { year: 2014, scopes: [1, 2, 3] }
in_2014 == also_in_2014 #=> true
not_in_2104 == in_2014 #=> false
Now, in order to store it and quickly look this up, it need to be turned into something of a shasum. Simply converting to string does not work, so generating a hexdigest from it does not work either:
require 'digest'
in_2014.to_s == also_in_2014.to_s #=> false
Digest::SHA2.hexdigest(in_2014.to_s) == Digest::SHA2.hexdigest(also_in_2014.to_s) #=> false
What I want is a shasum or some other identifier that will allow me to compare the hashes with one another. I want something like the last test that will return true if the contents of the hashes match.
I could sort the hashes before to_s
, yet that seems cludgy to me. I
am, for one, afraid that I am overlooking something there (a sort
returns an array, no longer a hash, for one). Is there
something simple that I am overlooking? Or is this not possible at all?
FWIW, we need this in a scenario like below:
Analysis.find_by_config({scopes: [1,2], year: 2014}).datasets
Analysis.find_by_config({account_id: 1337}).datasets
class Analysis < ActiveRecord::Base
def self.find_by_config(config)
self.find_by(config_digest: shasum_of(config))
end
def self.shasum_of(config)
#WAT?
end
def before_saving
self.config_digest = Analysis.shasum_of(config)
end
end
Note that here, Analysis does not have columns "scopes" or "year" or "account_id". These are arbitrary configs, that we only need for looking up the datasets.
Upvotes: 2
Views: 1590
Reputation: 7744
I wouldn't recommend the hash
method because it is unreliable. You can quickly confirm this by executing {one: 1}.hash
in your IRB, the same command in your Rails console, and then in the IRB and/or Rails Console on another machine. The outputs will differ.
Sticking with Digest::SHA2.hexdigest(string)
would be wiser.
You'll have to sort the hash and stringify it of course. This is what I would do:
hash.sort.to_s
If you don't want an array, for whatever reason, turn it back into a hash.
Hash[hash.sort].to_s #=> will return hash
And, for whatever reason, if you don't want to turn the hash into an array and then back into a hash, do the following for hash-to-sorted-hash:
def prepare_for_sum( hash )
hash.keys.sort.each_with_object({}) do |key, return_hash|
return_hash[key] = hash[key]
end.to_s
end
Using some modifications in the method above, you can sort the values too; it can be helpful in case of Array or Hash values.
Upvotes: 5