Reputation: 124
I'm trying to optimize the memory usage of a Python script. The following function, in particular, uses a lot of memory:
def fn_merge(df1, df2):
return pd.\
concat([
df1,
df2.\
query(
"aderencia_disciplina > 0 and "
"interesse == False"
).\
astype({"cod_docente": "int64"}).\
merge(
df1.\
astype({"cod_docente": "int64"}).\
drop(["discod", "interesse", "aderencia_disciplina"], axis=1).\
drop_duplicates(),
on=["iunicodempresa", "cod_docente"],
how="left"
)
])
df1
and df2
have 1.7 and 9.9 Mb, respectively. The issue is that the function seems to use a portion of memory and never "lets it go". If I execute it, say, ~20 times, RAM usage goes from ~2 to 8 Gb, and never drops. Does anyone know what's happening? I thought all the memory used within the function would be freed after it finished executing. Any help appreciated.
Upvotes: 0
Views: 384
Reputation: 67
See this, it is caused by malloc_trim and libc.so.6.
https://github.com/pandas-dev/pandas/issues/2659
Memory leak using pandas dataframe
I used to have this problem when reading 2GB csv files, so I didn't use pandas and solved it using "with open("filename", "r") as f" and writing a function myself.
Hope it can help you.
Upvotes: 1