Reputation: 111
I'm writing a very simple script that reads a fairly large file (3M lines, 1.1G file) that contains litteral (str) expression of polynomial. I then use Sympy for some symbolic calculation and write results to 16 separate files.
My script, as it runs, takes an increasing memory space (> 20 Gb), and I can't understand why. Would you see any way to improve the memory usage of that script ?
from sympy import sympify
from sympy.abc import x,y
from sympy import degree
fin = open("out_poly","r")
A = fin.readlines()
fin.close()
deg = 4
fou = [open("coeff_x"+str(i)+"y"+str(k),"w") for i in range(deg+1) for k in range(deg+1-i)]
for line in A:
expr = line.replace("^","**").replace("x0","x").replace("x1","y")
exprsy = sympify(expr)
cpt = 0
for i in range(deg+1):
for k in range(deg+1-i):
fou[cpt].write(str(exprsy.coeff(x,i).coeff(y,k))+"\n")
cpt = cpt+1
for files in fou:
files.close()
Upvotes: 4
Views: 1263
Reputation: 23955
I had the same issue today! In my case cache wouldn't really bring any benefit as I knew that my code wouldn't benefit from cache (sympy was used inside fitted function, and on each iteration parameters would be different).
So I wanted to disable cache altogether, and it turns out that it is possible: there is a enviorment variable: SYMPY_USE_CACHE
. If it is set to yes
cache is enabled if it is set to no
it is disabled altogether.
So I just added:
export SYMPY_USE_CACHE=no
Upvotes: 0
Reputation: 28242
The problem is probably because fin
is too large to be stored in the buffer. These lines:
fin = open("out_poly","r")
A = fin.readlines()
fin.close()
stores the whole content of fin
in the memory, which is why you are taking so much space in the memory.
Instead of storing it in A
, you can loop straight through the file itself:
from sympy import sympify
from sympy.abc import x,y
from sympy import degree
deg = 4
fou = [open("coeff_x"+str(i)+"y"+str(k),"w") for i in range(deg+1) for k in range(deg+1-i)]
with open("out_poly") as A:
for line in A:
expr = line.replace("^","**").replace("x0","x").replace("x1","y")
exprsy = sympify(expr)
cpt = 0
for i in range(deg+1):
for k in range(deg+1-i):
fou[cpt].write(str(exprsy.coeff(x,i).coeff(y,k))+"\n")
fou[cpt].close() #close it straight away, so we don't need to close it later
cpt = cpt+1
With this, it reads the file itself per line, not a copy of the file that is stored in the memory.
Hope this helps!
Upvotes: 4
Reputation: 111
Found it! The culprit was... Sympy!
Sympy caches expressions and fills up the memory. The problem can be solved either by setting up the environment variable SYMPY_NO_CACHE=no, but it can seriously affect Sympy performance. A better alternative is to import the following Sympy extension:
from sympy.core.cache import *
and clear up the cache in your code at adequate intervals:
clear_cache()
With those commands at each iteration in my code, the memory usage is stable and constant at only 26 Mo.
Links about the issue: http://code.google.com/p/sympy/issues/detail?id=3222
Links about Sympy cache: https://github.com/sympy/sympy/wiki/faq
Thanks all for your help.
Upvotes: 7