Rainbolt
Rainbolt

Reputation: 3660

Why does this code use more and more memory over time?

Python: 3.11 Saxonche: 12.4.2

My website keeps consuming more and more memory until the server runs out of memory and crashes. I isolated the problematic code to the following script:

import gc
from time import sleep

from saxonche import PySaxonProcessor


xml_str = """
<root>
    <stuff>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ac auctor ex. Nunc in tincidunt urna. Sed tincidunt eros lacus, sed pulvinar sem venenatis et. Donec euismod orci quis pellentesque sagittis. Donec at tortor in dui mattis facilisis. Pellentesque vel varius lectus. Nunc sed gravida risus, ac finibus elit. Etiam sollicitudin nunc a velit efficitur molestie in ac lectus. Donec vulputate orci odio, sit amet hendrerit odio rhoncus commodo.</stuff>
    <stuff>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ac auctor ex. Nunc in tincidunt urna. Sed tincidunt eros lacus, sed pulvinar sem venenatis et. Donec euismod orci quis pellentesque sagittis. Donec at tortor in dui mattis facilisis. Pellentesque vel varius lectus. Nunc sed gravida risus, ac finibus elit. Etiam sollicitudin nunc a velit efficitur molestie in ac lectus. Donec vulputate orci odio, sit amet hendrerit odio rhoncus commodo.</stuff>
    <stuff>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ac auctor ex. Nunc in tincidunt urna. Sed tincidunt eros lacus, sed pulvinar sem venenatis et. Donec euismod orci quis pellentesque sagittis. Donec at tortor in dui mattis facilisis. Pellentesque vel varius lectus. Nunc sed gravida risus, ac finibus elit. Etiam sollicitudin nunc a velit efficitur molestie in ac lectus. Donec vulputate orci odio, sit amet hendrerit odio rhoncus commodo.</stuff>
</root>
"""

while True:
    print('Running once...')
    with PySaxonProcessor(license=False) as proc:
        proc.parse_xml(xml_text=xml_str)

    gc.collect()
    sleep(1)

This script consumes memory at a rate of about 0.5 MB per second. The memory usage does not plateau after a while. I have logs showing that memory usage continues to grow for hours until the server runs out of memory and crashes.

Other things I tried that aren't shown above:

I have to use Saxon instead of lxml because I need XPath 3.0 support.

What am I doing wrong? How do I parse XML using Saxon in a way that doesn't leak?


A few folks have suggested that instantiating the PySaxonProcessor once before the loop will fix the leak. It doesn't. This still leaks:

with PySaxonProcessor(license=False) as proc:
    while True:
        print('Running once...')
        proc.parse_xml(xml_text=xml_str)

        gc.collect()
        sleep(1)

Upvotes: 4

Views: 319

Answers (3)

Norm
Norm

Reputation: 1036

It looks like a memory leak. I created a bug to track it: https://saxonica.plan.io/issues/6391

And the issue is now fixed in the released SaxonC 12.5.

Upvotes: 3

Adon Bilivit
Adon Bilivit

Reputation: 26825

There's clearly a failure to properly clean up once the context manager terminates - i.e., PySaxonProcessor.__exit__ isn't doing what it (probably) should do.

You need to contact the developer(s) as this isn't a Python issue per se. You are not doing anything wrong.

The problem can be replicated as follows:

from saxonche import PySaxonProcessor
import psutil

count = 0
process = psutil.Process()
prev = process.memory_info().rss

for _ in range(100):
    with PySaxonProcessor(license=False):
        pass
    if (count := count + 1) % 10 == 0:
        m = process.memory_info().rss
        print(f"{m - prev:,}")
        prev = m

Platform:

macOS 14.4.1
Python 3.12.2
M2

Output:

2,228,224
2,244,608
2,260,992
2,244,608
2,228,224
2,244,608
2,244,608
2,228,224
2,228,224

Upvotes: 4

Ghorban M. Tavakoly
Ghorban M. Tavakoly

Reputation: 1249

Saxon processor used in a loop without being properly cleaned up. Each time the loop runs, a new instance of PySaxonProcessor is created but it is not being properly released after it's done processing the XML. This can lead to a memory leak as the resources used by the processor are not being released.

Rewrite your code like:

with PySaxonProcessor(license=False) as proc:
    while True:
        print('Running once...')
        proc.parse_xml(xml_text=xml_str)
        sleep(1)

I didn't test it, but probabely it will solve your problem.


Edit: From its documentation: __exit__(...) The exit method for the context PySaxonProcessor. Here we release the Jet VM resources. If we have more than one live PySaxonProcessor object the release() method has no effect.

Maybe this is root of the cause.

Upvotes: -3

Related Questions