Jelle De Loecker
Jelle De Loecker

Reputation: 21985

Hashing an array or object in python 3

I want to hash a simple array of strings The documentation says you can't simple feed a string into hashlib's update() function, so I tried a regular variable, but then I got the TypeError: object supporting the buffer API required error.

Here's what I had so far

def generateHash(data):
    # Prepare the project id hash
    hashId = hashlib.md5()

    hashId.update(data)

    return hashId.hexdigest()

Upvotes: 9

Views: 18934

Answers (3)

pepr
pepr

Reputation: 20794

You can use the repr() function to get the (Unicode) string representation of the array (or of whatever object that implements conversion to a representation). Then you encode the string to UTF-8 (the order of bytes is the same everywhere when using UTF-8). The resulting bytes can be hashed as you tried above:

#!python3
import hashlib

def hashFor(data):
    # Prepare the project id hash
    hashId = hashlib.md5()

    hashId.update(repr(data).encode('utf-8'))

    return hashId.hexdigest()


if __name__ == '__main__':
    data1 = ['abc', 'de']
    data2 = ['a', 'bcde']
    print(hashFor(data1) + ':', data1)
    print(hashFor(data2) + ':', data2)

It prints on my console:

c:\tmp\___python\skerit\so17412304>py a.py
d26d27d8cbb7c6fe50637155c21d5af6: ['abc', 'de']
dbd5ab5df464b8bcee61fe8357f07b6e: ['a', 'bcde']

Upvotes: 13

Moh Zah
Moh Zah

Reputation: 292

Depending on what you want to do, getting the hash of all strings concatenated or hash of each string separately. you can get the fist following Thomas solution as m.update(a); m.update(b) is equivalent to m.update(a+b). Or the later following below solution

def generateHash(data):
    # Prepare the project id hash

    return [hashlib.md5(i.encode('utf-8')).hexdigest() for i in data]

Note that it returns a list. Each element is hash of a corresponding element in the given string list

Upvotes: 2

Thomas Orozco
Thomas Orozco

Reputation: 55283

If you'd like to hash a list of strings, a naive solution could be:

def hash_string_list(string_list):
    h = hashlib.md5()
    for s in string_list: # Note that you could use ''.join(string_list) instead
        h.update(s)       # s.encode('utf-8') if you're using Python 3
    return h.hexdigest()

However, be wary that ['abc', 'efg'] and ['a', 'bcefg'] would hash to the same value.

If you provide more context regarding your objective, other solutions might be more appropriate.

Upvotes: 1

Related Questions