Philip Ciunkiewicz
Philip Ciunkiewicz

Reputation: 2791

Is there any benefit to deleting a reference to a large Python object before overwriting that reference?

I am running some memory-heavy scripts which iterate over documents in a database, and due to memory constraints on the server I manually delete references to the large object at the conclusion of each iteration:

for document in database:
    initial_function_calls()

    big_object = memory_heavy_operation(document)
    save_to_file(big_object)

    del big_object

    additional_function_calls()

The initial_function_calls() and additional_function_calls() are each slightly memory-heavy. Do I see any benefit by explicitly deleting the reference to the large object for garbage collection? Alternatively, does leaving it and having it point to a new object in the next iteration suffice?

Upvotes: 2

Views: 210

Answers (1)

Roland Smith
Roland Smith

Reputation: 43505

As often in these cases; it depends. :-/

I'm assuming we're talking about CPython here.

Using del or re-assigning a name reduces the reference count for an object. Only if that reference could reaches 0 can it be de-allocated. So if you inadvertently stashed a reference to big_object away somewhere, using del won't help.

When garbage collection is triggered depends on the amount of allocations and de-allocations. See the documentation for gc.set_threshold().

If you're pretty sure that there are no further references, you could use gc.collect() to force a garbage collection run. That might help if your code doesn't do a lot of other allocations.

One thing to keep in mind is that if the big_object is created by a C extension module (like e.g. numpy), it could manage its own memory. In that case the garbage collection won't affect it! Also small integers and small strings are pre-allocated and won't be garbage collected. You can use gc.is_tracked() to check if an object is managed by the garbage collector.

What I would suggest is that you run your program with and without del+gc.collect(), and monitor the amount of RAM used. On UNIX-like systems, look at the resident set size. You could also use sys._debugmallocstats().

Unless you see the resident set size grow and grow, I wouldn't worry about it.

Upvotes: 2

Related Questions