Reputation: 3840
I am looking for a robust way to hash/serialize the content of a method in Python.
Use-case: We are doing some file caching the result of a transformation function, and it would be great if it was possible to automatically refresh if transformation function has changed:
cached_file = get_cached_filename(
data_version=data.version,
transform_version=1, # increment this when making updates
)
# Return the cached file if present
if cached_file.exists() and not overwrite:
logger.debug("Reading dataset from local cache")
return joblib.load(cached_file)
logger.debug("Downloading dataset and storing to local cache")
# Do the expensive data download and transform
df = download_and_apply_transforms(data)
# Store dataframe to local data cache
joblib.dump(df, cached_file)
return df
I am looking for something that could potentially replace my hardcoding of a version in transform_version=1
, but rather read a hash from the method itself.
I tried using the built-in hash method of a callable. I.e. download_and_apply_transforms.__hash__()
, but it seems to update on each re-run, so it likely bases itself on memory location somehow.
Upvotes: 2
Views: 78
Reputation: 3363
Good question. Not sure if it is the best approach, but you can get the function source code using inspect
module, like this:
inspect.getsource(foo)
That will return the source as a string, so you can then get the hash to get a cache key by running some hashing function.
Upvotes: 2