How exactly do the hashlib hashers treat input?

Question

The Python 2.7 documentation has this to say about the hashlib hashers:

hash.update(arg)

    Update the hash object with the string arg. [...]

But I have seen people feed it objects that are not strings, e.g. buffers, numpy ndarrays.

Given Python's duck typing, I'm not surprised that it is possible to specify non-string arguments.

The question is: how do I know the hasher is doing the right thing with the argument?

I can't imagine the hasher naïvely doing a shallow iteration on the argument because that would probably fail miserably with ndarrays with more than one dimension - if you do a shallow iteration, you get an ndarray with n-1 dimensions.

orlp · Accepted Answer

update unpacks its arguments using the s# format spec. This means that it can be either a string, Unicode or a buffer interface.

You can't define a buffer interface in pure Python, but C libraries like numpy can and do - which allows them to be passed into hash.update.

Things like multiple dimension arrays work fine - on the C level they're stored as a contiguous series of bytes.

How exactly do the hashlib hashers treat input?

Answers (1)

Related Questions