Reputation: 26528
I am writing a method to generate cache keys for caching function results, the key is based on a combination of function name and hash value of parameters.
Currently I am using hashlib to hash the serialized version of parameters, however the operation is very expensive to serialize large objects, so what's the alternative?
#get the cache key for storage
def cache_get_key(*args):
import hashlib
serialise = []
for arg in args:
serialise.append(str(arg))
key = hashlib.md5("".join(serialise)).hexdigest()
return key
UPDATE: I tried using hash(str(args)), but if args have relatively large data in it, still takes long time to compute the hash value. Any better way to do it?
Actually str(args) with large data takes forever...
Upvotes: 3
Views: 3930
Reputation: 6780
I know this question is old, but I just want to add my 2 cents:
You don't have to create a list and then join it. Especially if the list is going to be discarded anyways. Use the hash function's .update()
method
Consider using a much faster non-cryptographic hash algo, especially if this is not meant to be a cryptographically-secure implementation.
Having said that, this is my suggested improvement:
import xxhash
#get the cache key for storage
def cache_get_key(*args):
hasher = xxhash.xxh3_64()
for arg in args:
hasher.update(str(arg))
return hasher.hexdigest()
This uses the (claimed to be) extremely fast xxHash NCHF*.
* NCHF = Non-Cryptographic Hash Function
Upvotes: 1
Reputation: 49816
Have you tried just using the hash
function? It works perfectly well on tuples.
Upvotes: 1
Reputation: 2909
I've seen people feed an arbitrary python object to random.seed(), and then use the first value back from random.random() as the "hash" value. It doesn't give a terrific distribution of values (can be skewed), but it seems to work for arbitrary objects.
If you don't need cryptographic-strength hashes, I came up with a pair of hash functions for a list of integers that I use in a bloom filter. They appear below. The bloom filter actually uses linear combinations of these two hash functions to obtain an arbitrarily large number of hash functions, but they should work fine in other contexts that just need a bit of scattering with a decent distribution. They're inspired by Knuth's writing on Linear Congruential Random Number Generation. They take a list of integers as input, which I believe could just be the ord()'s of your serialized characters.
MERSENNES1 = [ 2 ** x - 1 for x in [ 17, 31, 127 ] ]
MERSENNES2 = [ 2 ** x - 1 for x in [ 19, 67, 257 ] ]
def simple_hash(int_list, prime1, prime2, prime3):
'''Compute a hash value from a list of integers and 3 primes'''
result = 0
for integer in int_list:
result += ((result + integer + prime1) * prime2) % prime3
return result
def hash1(int_list):
'''Basic hash function #1'''
return simple_hash(int_list, MERSENNES1[0], MERSENNES1[1], MERSENNES1[2])
def hash2(int_list):
'''Basic hash function #2'''
return simple_hash(int_list, MERSENNES2[0], MERSENNES2[1], MERSENNES2[2])
Upvotes: 0
Reputation: 40599
def cache_get_key(*args):
return hash(str(args))
or (if you really want to use the hashlib library)
def cache_get_key(*args):
return hashlib.md5(str(args)).hexdigest()
I wouldn't bother rewriting code to make arrays into strings. Use the inbuilt one.
below is the solution @8bitwide suggested. No hashing required at all with this solution!
def foo(x, y):
return x+y+1
result1 = foo(1,1)
result2 = foo(2,3)
results = {}
results[foo] = {}
results[foo][ [1,1] ] = result1
results[foo][ [2,3] ] = result2
Upvotes: 2
Reputation: 91094
Assuming you made the object, and it is composed of smaller components (it is not a binary blob), you can precompute the hash when you build the object by using the hashes of its subcomponents.
For example, rather than serialize(repr(arg))
, do arg.precomputedHash if isinstance(arg, ...) else serialize(repr(arg))
If you neither make your own objects nor use hash
able objects, you can perhaps keep a memoization table of objectreferences -> hashes, assuming you don't mutate the objects. Worst case, you can use a functional language which allows for memoization, since all objects in such a language are probably immutable and hence hashable.
Upvotes: 1