Hashing the code of a Python function (including lambdas)

Question

I'm trying to make a simple intermediate compilation function:

lst = [lambda x: calculate_urls(),
       lambda x: join_urls_with_ids(x, args[0]),
       lambda x: bucket_urls(x)]
intermediate.execute(lst)

intermediate.execute will run each item in the list. The first function's output is the second function's input, and so on. The lambdas are used to make each function take only one argument. (The first input value is None, so it's effectively ignored here.)

At each intermediate step the output is pickled and saved (currently just to /tmp). If some intermediate output already exists for a function, execution of the function is skipped and it goes onto the next step.

I'm looking for a way to detect changes in the function code, and I figured hashing the code would be a quick way of doing that. I want it to detect if someone changed the implementation of a function, so that it would execute the function again and ignore the cached value.

I found out about func_code, which is hashable AFAIK. However, it only works for that particular function, which means that the hash only changes if the lambdas in the code above change, not the functions being called by the lambdas. Is what I'm looking for theoretically possible? Is there a reasonable middle ground?

BrenBarn · Accepted Answer

Instead of storing lambdas, you could store lists of function-argument tuples, and hash the func_code of those functions. That is, instead of what you have, do:

lst = [
    (calculate urls, ()),
    (join_urls_with_ids, (args[0])),
    (bucket_urls, ())
]

Then make intermediate.execute be like:

def execute(lst):
    result = ()
    for func, args in list:
        args = result + args
        result = func(*args)

However, it's a little hard to grasp what you're trying to do here. Is the idea that the function implementations might change between different invocations of the program? (That is, run, save cache, exit, change the source code of calculate_urls, then run again?) If so, you might be better off hashing the function's source code instead (using inspect.getsource). Python bytecode (which is what's in func_code) is not really a stable target and could change between Python versions.

Hashing the code of a Python function (including lambdas)

Answers (2)

Related Questions