SanSim
SanSim

Reputation: 111

Simple Python script using loads of memory

I'm writing a very simple script that reads a fairly large file (3M lines, 1.1G file) that contains litteral (str) expression of polynomial. I then use Sympy for some symbolic calculation and write results to 16 separate files.

My script, as it runs, takes an increasing memory space (> 20 Gb), and I can't understand why. Would you see any way to improve the memory usage of that script ?

from sympy import sympify
from sympy.abc import x,y
from sympy import degree

fin = open("out_poly","r")
A = fin.readlines()
fin.close()
deg = 4
fou = [open("coeff_x"+str(i)+"y"+str(k),"w") for i in range(deg+1) for k in range(deg+1-i)]

for line in A:
  expr = line.replace("^","**").replace("x0","x").replace("x1","y")
  exprsy = sympify(expr)
  cpt = 0
  for i in range(deg+1):
    for k in range(deg+1-i):
      fou[cpt].write(str(exprsy.coeff(x,i).coeff(y,k))+"\n")
      cpt = cpt+1

for files in fou:
  files.close()

Upvotes: 4

Views: 1263

Answers (3)

jb.
jb.

Reputation: 23955

I had the same issue today! In my case cache wouldn't really bring any benefit as I knew that my code wouldn't benefit from cache (sympy was used inside fitted function, and on each iteration parameters would be different).

So I wanted to disable cache altogether, and it turns out that it is possible: there is a enviorment variable: SYMPY_USE_CACHE. If it is set to yes cache is enabled if it is set to no it is disabled altogether.

So I just added:

export SYMPY_USE_CACHE=no

Upvotes: 0

aIKid
aIKid

Reputation: 28242

The problem is probably because fin is too large to be stored in the buffer. These lines:

fin = open("out_poly","r")
A = fin.readlines()
fin.close()

stores the whole content of fin in the memory, which is why you are taking so much space in the memory.

Instead of storing it in A, you can loop straight through the file itself:

from sympy import sympify
from sympy.abc import x,y
from sympy import degree

deg = 4
fou = [open("coeff_x"+str(i)+"y"+str(k),"w") for i in range(deg+1) for k in range(deg+1-i)]

with open("out_poly") as A:
    for line in A:
      expr = line.replace("^","**").replace("x0","x").replace("x1","y")
      exprsy = sympify(expr)
      cpt = 0
      for i in range(deg+1):
        for k in range(deg+1-i):
          fou[cpt].write(str(exprsy.coeff(x,i).coeff(y,k))+"\n")
          fou[cpt].close() #close it straight away, so we don't need to close it later
          cpt = cpt+1

With this, it reads the file itself per line, not a copy of the file that is stored in the memory.

Hope this helps!

Upvotes: 4

SanSim
SanSim

Reputation: 111

Found it! The culprit was... Sympy!

Sympy caches expressions and fills up the memory. The problem can be solved either by setting up the environment variable SYMPY_NO_CACHE=no, but it can seriously affect Sympy performance. A better alternative is to import the following Sympy extension:

from sympy.core.cache import *

and clear up the cache in your code at adequate intervals:

clear_cache()

With those commands at each iteration in my code, the memory usage is stable and constant at only 26 Mo.

Links about the issue: http://code.google.com/p/sympy/issues/detail?id=3222

Links about Sympy cache: https://github.com/sympy/sympy/wiki/faq

Thanks all for your help.

Upvotes: 7

Related Questions