Andrew Mao
Andrew Mao

Reputation: 36930

Implement an optional context manager in Python

We have a codebase with the following use pattern:

factory = DataFactory(args)
dataset = factory.download_and_cache_big_dataset(key)
metadata = dataset.get_some_metadata()

Currently, download_and_cache_big_dataset fetches a very large file from S3 and puts it somewhere. Among other things, it does

filename = get_s3_key(key)
filepath = os.path.join(get_tmp_dir(), filename)
s3.download_file(key, filepath)
return BigFileClass(filepath) # gets stored in a class somewhere

However, this file doesn't get deleted. This is fine when this function is called sparingly and relies on file caching, but bad when it is called repeatedly and we don't want to fill up the disk. Is there a way to refactor the code with a context manager such that we can use it as

factory = DataFactory(args)
with factory.download_and_cache_big_dataset(key) as dataset:
    metadata = dataset.get_some_metadata()
    # do something with metadata

# file gets automatically deleted

But critically, without breaking the existing usage, so that the other code works as is? Or will there need to be a different method that returns the context manager?

Upvotes: 1

Views: 435

Answers (1)

Roland Smith
Roland Smith

Reputation: 43533

Since you return an instance of BigFileClass to handle/represent the data, I would suggest the following.

I'm assuming that the data file is unique to each instance.

  • Add an instance variable to BigFileClass to keep track of the path of the data file.
  • Add a __del__ method to BigFileClass in which the data file is removed.

Edit: If you want to use BigFileClass as a contextmanager, define __enter__ and __exit__ methods for BigFileClass. The only thing that __enter__ has to do in this case is basically return self.

I would leave the task of deleting the file to the __del__ method (when the reference count for a BigFileClass reaches 0). It doesn't feel right to have the class instance still around when you have already deleted the data file.


Remark w.r.t. architecture.

The use of a factory seems like an unnecessary complication to me. IMO, download_and_cache_big_dataset could just be a function returning a BigFileClass instance.

Upvotes: 1

Related Questions