casparjespersen
casparjespersen

Reputation: 3840

Hashing the content of a method in Python

I am looking for a robust way to hash/serialize the content of a method in Python.

Use-case: We are doing some file caching the result of a transformation function, and it would be great if it was possible to automatically refresh if transformation function has changed:

cached_file = get_cached_filename(
    data_version=data.version,
    transform_version=1,  # increment this when making updates
)

# Return the cached file if present
if cached_file.exists() and not overwrite:
    logger.debug("Reading dataset from local cache")
    return joblib.load(cached_file)
    
logger.debug("Downloading dataset and storing to local cache")

# Do the expensive data download and transform
df = download_and_apply_transforms(data)

# Store dataframe to local data cache
joblib.dump(df, cached_file)

return df

I am looking for something that could potentially replace my hardcoding of a version in transform_version=1, but rather read a hash from the method itself.

I tried using the built-in hash method of a callable. I.e. download_and_apply_transforms.__hash__(), but it seems to update on each re-run, so it likely bases itself on memory location somehow.

Upvotes: 2

Views: 78

Answers (1)

svfat
svfat

Reputation: 3363

Good question. Not sure if it is the best approach, but you can get the function source code using inspect module, like this:

inspect.getsource(foo)

That will return the source as a string, so you can then get the hash to get a cache key by running some hashing function.

Upvotes: 2

Related Questions