Reputation: 3382
I'm using lxml to parse some pretty big xml files (around ~15MB each). while I'm conceptually doing is the following:
import lxml.etree as ET
def process_xmls():
for xml_file in xml_files:
tree=ET.parse(xml_file)
etc. etc.
Now, I'm calling the function, and I see the memory is increasing and increasing, which is reasonable. The problem is, that also after the function ends - the memory stays high, and Python does not release it! Why is that, and is there any workaround?
Upvotes: 3
Views: 311
Reputation: 20450
It may be that lxml called malloc(), which called sbrk(). And then virtual memory never gets any smaller.
But that's not the end of the world. The ps VSZ may never shrink, but under memory pressure from other processes RSS should shrink with pageouts. Depending on the activity pattern of your app, which you never described, those "cold" pages may never be referenced again, so your long-lived python process winds up with a small memory footprint despite the large VSZ.
If your app can run for 24 hours, sometimes reading 15 MiB files, with stable memory numbers, then it's not leaking. The first file read will inflate the memory figures, but as long as subsequent file reads won't lead to monotonically increasing memory consumption you should be in good shape.
If you're very upset about the footprint, consider telling your long-lived app to use multiprocessing to fork off short-lived parser processes. They will call sbrk(), sbrk(), sbrk(), then exit(), and all resources will be immediately reclaimed.
Upvotes: 1