Reputation: 36930
We have a codebase with the following use pattern:
factory = DataFactory(args)
dataset = factory.download_and_cache_big_dataset(key)
metadata = dataset.get_some_metadata()
Currently, download_and_cache_big_dataset
fetches a very large file from S3 and puts it somewhere. Among other things, it does
filename = get_s3_key(key)
filepath = os.path.join(get_tmp_dir(), filename)
s3.download_file(key, filepath)
return BigFileClass(filepath) # gets stored in a class somewhere
However, this file doesn't get deleted. This is fine when this function is called sparingly and relies on file caching, but bad when it is called repeatedly and we don't want to fill up the disk. Is there a way to refactor the code with a context manager such that we can use it as
factory = DataFactory(args)
with factory.download_and_cache_big_dataset(key) as dataset:
metadata = dataset.get_some_metadata()
# do something with metadata
# file gets automatically deleted
But critically, without breaking the existing usage, so that the other code works as is? Or will there need to be a different method that returns the context manager?
Upvotes: 1
Views: 435
Reputation: 43533
Since you return an instance of BigFileClass
to handle/represent the data, I would suggest the following.
I'm assuming that the data file is unique to each instance.
BigFileClass
to keep track of the path of the data file.__del__
method to BigFileClass
in which the data file is removed.Edit: If you want to use BigFileClass
as a contextmanager, define __enter__
and __exit__
methods for BigFileClass
. The only thing that __enter__
has to do in this case is basically return self
.
I would leave the task of deleting the file to the __del__
method (when the reference count for a BigFileClass
reaches 0). It doesn't feel right to have the class instance still around when you have already deleted the data file.
Remark w.r.t. architecture.
The use of a factory seems like an unnecessary complication to me. IMO, download_and_cache_big_dataset
could just be a function returning a BigFileClass
instance.
Upvotes: 1