berkes
berkes

Reputation: 27583

How to generate a unique identifier for a hash with a certain content?

For a caching layer, I need to create a unique sha for a hash. It should be unique for the content of that hash. Two hashes with the same config should have the same sha.

in_2014 = { scopes: [1, 2, 3], year: 2014 }
not_in_2104 = { scopes: [1, 2, 3], year: 2015 }
also_in_2014 = { year: 2014, scopes: [1, 2, 3] }

in_2014 == also_in_2014 #=> true
not_in_2104 == in_2014  #=> false

Now, in order to store it and quickly look this up, it need to be turned into something of a shasum. Simply converting to string does not work, so generating a hexdigest from it does not work either:

require 'digest'
in_2014.to_s == also_in_2014.to_s #=> false
Digest::SHA2.hexdigest(in_2014.to_s) == Digest::SHA2.hexdigest(also_in_2014.to_s) #=> false

What I want is a shasum or some other identifier that will allow me to compare the hashes with one another. I want something like the last test that will return true if the contents of the hashes match.

I could sort the hashes before to_s, yet that seems cludgy to me. I am, for one, afraid that I am overlooking something there (a sort returns an array, no longer a hash, for one). Is there something simple that I am overlooking? Or is this not possible at all?

FWIW, we need this in a scenario like below:

Analysis.find_by_config({scopes: [1,2], year: 2014}).datasets
Analysis.find_by_config({account_id: 1337}).datasets

class Analysis < ActiveRecord::Base
  def self.find_by_config(config)
    self.find_by(config_digest: shasum_of(config))
  end

  def self.shasum_of(config)
     #WAT?
  end

  def before_saving
    self.config_digest = Analysis.shasum_of(config)
  end
end

Note that here, Analysis does not have columns "scopes" or "year" or "account_id". These are arbitrary configs, that we only need for looking up the datasets.

Upvotes: 2

Views: 1590

Answers (2)

SHS
SHS

Reputation: 7744

I wouldn't recommend the hash method because it is unreliable. You can quickly confirm this by executing {one: 1}.hash in your IRB, the same command in your Rails console, and then in the IRB and/or Rails Console on another machine. The outputs will differ.

Sticking with Digest::SHA2.hexdigest(string) would be wiser.

You'll have to sort the hash and stringify it of course. This is what I would do:

hash.sort.to_s

If you don't want an array, for whatever reason, turn it back into a hash.

Hash[hash.sort].to_s #=> will return hash

And, for whatever reason, if you don't want to turn the hash into an array and then back into a hash, do the following for hash-to-sorted-hash:

def prepare_for_sum( hash )
  hash.keys.sort.each_with_object({}) do |key, return_hash|
    return_hash[key] = hash[key]
  end.to_s
end

Using some modifications in the method above, you can sort the values too; it can be helpful in case of Array or Hash values.

Upvotes: 5

berkes
berkes

Reputation: 27583

Turns out, Ruby has a method for this exact case: Hash.hash.

in_2014.hash == also_in_2014.hash

Upvotes: 0

Related Questions