Stuart Lacy
Stuart Lacy

Reputation: 2003

PyPy memory usage increasing over time

I've noticed some oddities in the memory usage of my program running under PyPy and Python. Under PyPy the program uses not only a substantially larger initial amount of memory than CPython, but this memory usage increases over time quite dramatically. At the end of the program under PyPy it's using around 170MB, compared to 14MB when run under CPython.

I found a user with the exact same problem, albeit on a smaller scale, but the solutions which worked for him provided only a minor help for my program pypy memory usage grows forever? The two things I tried changing were setting the environment variables PYPY_GC_MAX to be 100MB and PYPY_GC_GROWTH = 1.1, and also manually calling gc.collect() at each generation.

I'm determining the memory usage with

resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1000

Here's the runtime and memory usage under different conditions:

Version: time taken, memory used at end of run
PyPy 2.5.0: 100s, 173MB
PyPy with PYPY_GC_MAX = 100MB and PYPY_GC_GROWTH = 1.1: 102s, 178MB
PyPy with gc.collect(): 108s, 131MB
Python 2.7.3: 167s, 14MB

As you can see the program is much quicker under PyPy than CPython which is why I moved to it in the first place, but at the cost of a 10-fold increase in memory.

The program is an implementation of Genetic Programming, where I'm evolving an arithmetic binary tree over 100 generations, with 200 individuals in the population. Each node in the tree has a reference to its 2 children and these trees can increase in size although for this experiment they stay relatively stable. Depending on the application this program can be running for 10 minutes up to a few hours, but for the results here I've set it to a smaller dataset to highlight the issue.

Does anyone have any idea a) what could be causing this, and b) if it's possible to limit the memory usage to somewhat more respectable levels?

Upvotes: 3

Views: 3130

Answers (1)

Armin Rigo
Armin Rigo

Reputation: 12900

PyPy is known to use more baseline memory than CPython, and this number is known to increase over time, as the JIT compiles more and more machine code. It does (or at least should) converge --- what this means is that the memory usage should increase as your program runs, but only until a maximum. You should get roughly the same usage after running for 10 minutes or after several hours.

We can discuss endlessly if 170MB is too much or not for a "baseline". What I can tell is that a program that uses several GBs of memory on CPython uses not significantly more on PyPy --- that's our goal and our experience so far; but please report it as a bug if your experience is different.

Upvotes: 9

Related Questions