Bogdan Ruzhitskiy
Bogdan Ruzhitskiy

Reputation: 1237

Hash for lambda function in Python

I'm trying to get the hash of a lambda function. Why do I get two values (8746164008739 and -9223363290690767077)? Why is the hash from the lambda function not always one value?

>>> fn = lambda: 1
>>> hash(fn)
-9223363290690767077
>>> fn = lambda: 1
>>> hash(fn)
8746164008739
>>> fn = lambda: 1
>>> hash(fn)
-9223363290690767077
>>> fn = lambda: 1
>>> hash(fn)
8746164008739
>>> fn = lambda: 1
>>> hash(fn)
-9223363290690767077

Upvotes: 26

Views: 4570

Answers (6)

user21739987
user21739987

Reputation: 1

hash_func = lambda input_str: hashlib.sha256(input_str.encode()).hexdigest()

Upvotes: 0

Erik Aronesty
Erik Aronesty

Reputation: 12935

Implementation-dependent way to get reasonably sufficient contents of a lambda in python3:

def function_contents(func):
    closure = tuple(cell.cell_contents for cell in func.__closure__) if func.__closure__ else ()
    return (func.__name__, func.__defaults__, closure, func.__code__.co_code, func.__code__.co_consts)

This includes:

  • the code of the lambda
  • default captures
  • closure captures

Should be enough for your use case.

Your example:

fn = lambda: 1
hash(function_contents(fn))
fn = lambda: 1
hash(function_contents(fn))
fn = lambda: 1
hash(function_contents(fn))

... same value every time.

Upvotes: 0

NPE
NPE

Reputation: 500773

Two objects are not guaranteed to hash to the same value unless they compare equal [1].

Python functions (including lambdas) don't compare equal even if they have identical code [2]. For example:

>>> (lambda: 1) == (lambda: 1)
False

Implementation-wise, this behaviour is due to the fact that function objects don't provide their own equality operator. Instead, they inherit the default one that uses the object's identity, i.e. its address. From the documentation:

If no __cmp__(), __eq__() or __ne__() operation is defined, class instances are compared by object identity (“address”).

Here is what happens in your particular example:

fn = lambda: 1  # New function is allocated at address A and stored in fn.
fn = lambda: 1  # New function is allocated at address B and stored in fn.
                # The function at address A is garbage collected.
fn = lambda: 1  # New function is allocated at address A and stored in fn.
                # The function at address B is garbage collected.
fn = lambda: 1  # New function is allocated at address B and stored in fn.
                # The function at address A is garbage collected.
...

Since address A is always hashed to one value, and address B to another, you are seeing hash(fn) alternate between the two values. This alternating behaviour is, however, an implementation artefact and could change one day if, for example, the garbage collector were made to behave slightly differently.

The following insightful note has been contributed by @ruakh:

It is worth noting that it's not possible to write a general process for determining if two functions are equivalent. (This is a consequence of the undecidability of the halting problem.) Furthermore, two Python functions can behave differently even if their code is identical (since they may be closures referring to distinct-but-identically-named variables). So it makes sense that Python functions don't overload the equality operator: there's no way to implement anything better than the default object-identity comparison.

[1] The converse is generally not true: two objects that compare unequal can have the same hash value. This is called a hash collision.

[2] Calling your lambdas and then hashing the result would of course always give the same value since hash(1) is always the same within one program:

>>> (lambda: 1)() == (lambda: 1)()
True

Upvotes: 43

PM 2Ring
PM 2Ring

Reputation: 55499

Each time you do fn = lambda: 1 a new function object is created, and the old object bound to the name fn is marked for deletion. But Python doesn't simply deallocate the object, passing its memory back to the OS. To minimize system calls for memory allocation and to minimize memory fragmentation Python tries to recycle memory when it can. And so when you create fn = lambda: 1 a third time the interpreter notices it's got a block of RAM handy that's big enough for the new function object, and so it uses that block. And thus your 3rd fn ends up in that block of RAM and hence has the same id as the first fn, since the id of CPython objects is their memory address.

(As others have mentioned the hash of any object type that doesn't provide a specific implementation of __hash__ is based on its id in CPython. And if a class does not define a __cmp__ or __eq__ method it should not define a __hash__ operation either).

Upvotes: 5

user1804599
user1804599

Reputation:

Deciding whether two functions are equal is impossible, as it is a superset of the halting problem.

In an ideal world, comparing (and therefore hashing) functions would result in a type error. Python apparently doesn't like this, and instead chooses to use the identity of the functions to compare (and therefore hash) them.

Upvotes: 5

Alex Riley
Alex Riley

Reputation: 176978

The hash of a lambda function object is based on its memory address (in CPython this is what the id function returns). This means that any two function objects will have different hashes (assuming there are no hash collisions), even if the functions contain the same code.

To explain what's happening in the question, first note that writing fn = lambda: 1 creates a new function object in memory and binds the name fn to it. This new function will therefore have a different hash value to any existing functions.

Repeating fn = lambda: 1, you get alternating values for the hashes because when fn is bound to the newly created function object, the function that fn previously pointed to is garbage collected by Python. This is because there are no longer any references to it (since the name fn now points to a different object).

The Python interpreter then reuses this old memory address for the next new function object created by writing fn = lambda: 1.

This behaviour might vary between different systems and Python implementations.

Upvotes: 10

Related Questions