Reputation: 268
I have 50 pickle files that are 0.5 GB each. Each pickle file is comprised of a list of custom class objects. I have no trouble loading the files individually using the following function:
def loadPickle(fp):
with open(fp, 'rb') as fh:
listOfObj = pickle.load(fh)
return listOfObj
However, when I try to iteratively load the files I get a memory leak.
l = ['filepath1', 'filepath2', 'filepath3', 'filepath4']
for fp in l:
x = loadPickle(fp)
print( 'loaded {0}'.format(fp) )
My memory overflows before loaded filepath2
is printed.
How can I write code that guarantees that only a single pickle is loaded during each iteration?
Answers to related questions on SO suggest using objects defined in the weakref
module or explicit garbage collection using the gc
module, but I am having a difficult time understanding how I would apply these methods to my particular use case. This is because I have an insufficient understanding of how referencing works under the hood.
Related: Python garbage collection
Upvotes: 15
Views: 6495
Reputation: 6326
You can fix that by adding x = None
right after for fp in l:
.
The reason this works is because it will dereferenciate variable x
, hance allowing the python garbage collector to free some virtual memory before calling loadPickle()
the second time.
Upvotes: 9