Reputation: 6808
The code is extremely simple. It shouldn't have any leaks since all is done inside the function. And nothing is returned.
I have a function which goes over all lines in a file (~20 MiB) and puts them all into a list.
Mentioned function:
def read_art_file(filename, path_to_dir):
import codecs
corpus = []
corpus_file = codecs.open(path_to_dir + filename, 'r', 'iso-8859-15')
newline = corpus_file.readline().strip()
while newline != '':
# we put into @article a @newline of file and some other info
# (i left those lists blank for readability)
article = [newline, [], [], [], [], [], [], [], [], [], [], [], []]
corpus.append(article)
del newline
del article
newline = corpus_file.readline().strip()
memory_usage('inside function')
for article in corpus:
for word in article:
del word
del article
del corpus
corpus_file.close()
memory_usage('inside: after corp deleted')
return
Here is the main code:
memory_usage('START')
path_to_dir = '/home/soshial/internship/training_data/parser_output/'
read_art_file('accounting.n.txt.wpr.art', path_to_dir)
memory_usage('outside func')
time.sleep(5)
memory_usage('END')
All memory_usage
just prints amount of KiB allocated by the script.
If I run the script, it gives me:
START memory: 6088 KiB
inside memory: 393752 KiB (20 MiB file + lists occupy 400 MiB)
inside: after corp deleted memory: 43360 KiB
outside func memory: 34300 KiB (34300-6088= 28 MiB leaked)
FINISH memory: 34300 KiB
And if I do absolutely the same thing, but with appending article
to the corpus
commented out:
article = [newline, [], [], [], [], [], ...] # we still assign data to `article`
# corpus.append(article) # we don't have this string during second execution
This way output gives me:
START memory: 6076 KiB
inside memory: 6076 KiB
inside: after corp deleted memory: 6076 KiB
outside func memory: 6076 KiB
FINISH memory: 6076 KiB
Hence, this way all memory is being freed. I need to have all memory freed since I'm going to process hundreds of such files.
Is it that I do something wrong or it is the CPython interpreter bug?
UPD. This is how I check memory consumption (taken from some other stackoverflow question):
def memory_usage(text = ''):
"""Memory usage of the current process in kilobytes."""
status = None
result = {'peak': 0, 'rss': 0}
try:
# This will only work on systems with a /proc file system
# (like Linux).
status = open('/proc/self/status')
for line in status:
parts = line.split()
key = parts[0][2:-1].lower()
if key in result:
result[key] = int(parts[1])
finally:
if status is not None:
status.close()
print('>', text, 'memory:', result['rss'], 'KiB ')
return
Upvotes: 9
Views: 9875
Reputation: 532093
This loop
for article in corpus:
for word in article:
del word
del article
does not free memory. del word
simply decrements the reference count of the object referenced by the name word
. However, your loop increments the reference count of each object by one when the loop variable is set. In other words, there is no net change in the reference count of any object due to this loop.
When you comment out the call to corpus.append
, you are not keeping any references to objects read from the file from one iteration to the next, so the interpreter is free to deallocate the memory earlier, which accounts for the decrease in memory you observe.
Upvotes: 1
Reputation: 310117
Please note that python never guarantees that any memory that your code uses will actually get returned to the OS. All that garbage collection guarantees is that the memory used by an object which has been collected is free to be used by another object at some future time.
From what I've read1 about the Cpython implementation of the memory allocator, memory gets allocated in "pools" for efficiency. When a pool is full, python will allocate a new pool. If a pool contains only dead objects, Cpython actually free the memory associated with that pool, but otherwise it doesn't. This can lead to multiple partially full pools hanging around after a function or whatever. However, this doesn't really mean it is a "memory leak". (Cpython still knows about the memory and could potentially free it at some later time).
1I'm not a python dev, so these details are likely to be incorrect or at least incomplete
Upvotes: 8