Lfppfs
Lfppfs

Reputation: 124

Pandas merge does not release memory

I'm trying to optimize the memory usage of a Python script. The following function, in particular, uses a lot of memory:

def fn_merge(df1, df2):
    return pd.\
        concat([
            df1,
            df2.\
                query(
                    "aderencia_disciplina > 0 and "
                    "interesse == False"
                ).\
                astype({"cod_docente": "int64"}).\
                merge(
                    df1.\
                    astype({"cod_docente": "int64"}).\
                    drop(["discod", "interesse", "aderencia_disciplina"], axis=1).\
                    drop_duplicates(),
                    on=["iunicodempresa", "cod_docente"],
                    how="left"
            )
        ])

df1 and df2 have 1.7 and 9.9 Mb, respectively. The issue is that the function seems to use a portion of memory and never "lets it go". If I execute it, say, ~20 times, RAM usage goes from ~2 to 8 Gb, and never drops. Does anyone know what's happening? I thought all the memory used within the function would be freed after it finished executing. Any help appreciated.

Upvotes: 0

Views: 384

Answers (1)

謝咏辰
謝咏辰

Reputation: 67

See this, it is caused by malloc_trim and libc.so.6.

https://github.com/pandas-dev/pandas/issues/2659

Memory leak using pandas dataframe

I used to have this problem when reading 2GB csv files, so I didn't use pandas and solved it using "with open("filename", "r") as f" and writing a function myself.

Hope it can help you.

Upvotes: 1

Related Questions