Deal with buffer in Python 2.x and 3.x

Question

Trying to take a rough hash of a file in Python 2.x and 3.x. Must use this hash function - not built in one.

Using

get_file_hash("my-file.txt")

3.x works. 2.x gives an error because the type of the incoming value is 'str'.

Error says

    value = content[0] << 7
    TypeError: unsupported operand type(s) for <<: 'str' and 'int'

Here's the code

def c_mul(a,b):
    return eval(hex((int(a) * b) & 0xFFFFFFFF)[:-1])

def get_hash(content):
    value = 0
    if len(content) > 0:
        print (type(content))
        print (type(content[0]))
        value = content[0] << 7
        for char in content:
            value = c_mul(1000003, value) ^ char
        value = value ^ len(content)
        if value == -1:
            value = -2
    return value

def get_file_hash(filename):
    with open(filename, "rb") as pyfile:
        return get_hash(pyfile.read())

How can I fix get_hash or get_file_hash so this works on 2.x and 3.x?

falsetru · Accepted Answer

file.read() for a file open with binary mode return bytes in Python 3, and str (== bytes) in Python 2.

But iteratring bytes objects yields different result in both version:

>>> list(b'123') # In Python 3.x, yields `int`s
[49, 50, 51]
>>> list(b'123') # In Python 2.x, yields `string`s
['1', '2', '3']

Use bytearray. Iterating it will yields ints in both version.

>>> list(bytearray(b'123')) # Python 3.x
[49, 50, 51]
>>> list(bytearray(b'123')) # Python 2.x
[49, 50, 51]

def c_mul(a,b):
    return (a * b) & 0xFFFFFFFF

def get_hash(content):
    content = bytearray(content) # <-----
    value = 0
    if len(content) > 0:
        value = content[0] << 7
        for char in content:
            value = c_mul(1000003, value) ^ char
        value = value ^ len(content)
        if value == -1:
            value = -2
    return value

def get_file_hash(filename):
    with open(filename, "rb") as pyfile:
        return get_hash(pyfile.read())

BTW, I modified c_mul not to use hex, eval. (I assumed that you used it to remove trailing L in Python 2.x).

>>> hex(289374982374)
'0x436017d0e6L'
#            ^

Deal with buffer in Python 2.x and 3.x

Answers (1)

Related Questions