Reputation: 21985
I want to hash a simple array of strings
The documentation says you can't simple feed a string into hashlib's update() function,
so I tried a regular variable, but then I got the TypeError: object supporting the buffer API required
error.
Here's what I had so far
def generateHash(data):
# Prepare the project id hash
hashId = hashlib.md5()
hashId.update(data)
return hashId.hexdigest()
Upvotes: 9
Views: 18934
Reputation: 20794
You can use the repr()
function to get the (Unicode) string representation of the array (or of whatever object that implements conversion to a representation). Then you encode the string to UTF-8 (the order of bytes is the same everywhere when using UTF-8). The resulting bytes can be hashed as you tried above:
#!python3
import hashlib
def hashFor(data):
# Prepare the project id hash
hashId = hashlib.md5()
hashId.update(repr(data).encode('utf-8'))
return hashId.hexdigest()
if __name__ == '__main__':
data1 = ['abc', 'de']
data2 = ['a', 'bcde']
print(hashFor(data1) + ':', data1)
print(hashFor(data2) + ':', data2)
It prints on my console:
c:\tmp\___python\skerit\so17412304>py a.py
d26d27d8cbb7c6fe50637155c21d5af6: ['abc', 'de']
dbd5ab5df464b8bcee61fe8357f07b6e: ['a', 'bcde']
Upvotes: 13
Reputation: 292
Depending on what you want to do, getting the hash of all strings concatenated or hash of each string separately. you can get the fist following Thomas solution as m.update(a); m.update(b) is equivalent to m.update(a+b). Or the later following below solution
def generateHash(data):
# Prepare the project id hash
return [hashlib.md5(i.encode('utf-8')).hexdigest() for i in data]
Note that it returns a list. Each element is hash of a corresponding element in the given string list
Upvotes: 2
Reputation: 55283
If you'd like to hash a list of strings, a naive solution could be:
def hash_string_list(string_list):
h = hashlib.md5()
for s in string_list: # Note that you could use ''.join(string_list) instead
h.update(s) # s.encode('utf-8') if you're using Python 3
return h.hexdigest()
However, be wary that ['abc', 'efg']
and ['a', 'bcefg']
would hash to the same value.
If you provide more context regarding your objective, other solutions might be more appropriate.
Upvotes: 1