Memory leak on Python SQLAlchemy?

Question

In this sample code, I track the memory usage every while loop.

In the first 50 loops, the 'Biggest DIFF' is C:\Program Files\Python39\lib racemalloc.py:115 so that's not interesting (also it's not in my real problem). After 50 loops, %userprofile%\AppData\Roaming\Python\Python39\site-packages\sqlalchemy\sql\elements.py:1551 takes over as biggest memory change, with the size growing every loop with 5KiB and the count growing 5 with every loop. That's the size of the df in this example.

The while loop was meant to have no memory leak, as you reassign df every time, then write it, then reassign, then write, and so on, and so on. But it leaks anyway, through SA.

Of course this is an example, I encountered this with a df.shape of [1000000,100] per loop. I don't care about 300 KiB.

What am I not understanding?

import pandas as pd
import sqlalchemy as sa
import tracemalloc

tracemalloc.start(100)

engine = sa.create_engine(
    'mssql+pyodbc://' + UN + ':' + PW + '@' + HOST + '/' + DB +"?driver=ODBC Driver 17 for SQL Server&Encrypt=Yes",
    pool_pre_ping=True,
    fast_executemany=True)

things = ['a', 'b', 'c']
do_count = 0

def get_some_data():
    df = pd.DataFrame([['Ajitesh', 84, 183, 'no'],
                       ['Shailesh', 79, 186, 'yes'],
                       ['Seema', 67, 158, 'yes'],
                       ['Nidhi', 52, 155, 'no'],
                       ['Ajitesh', 84, 183, 'no'],
                       ['Shailesh', 79, 186, 'yes'],
                       ['Seema', 67, 158, 'yes'],
                       ['Nidhi', 52, 155, 'no'],
                       ['Ajitesh', 84, 183, 'no'],
                       ['Shailesh', 79, 186, 'yes'],
                       ['Seema', 67, 158, 'yes'],
                       ['Nidhi', 52, 155, 'no'],
                       ['Ajitesh', 84, 183, 'no'],
                       ['Shailesh', 79, 186, 'yes'],
                       ['Seema', 67, 158, 'yes'],
                       ['Nidhi', 52, 155, 'no'],
                       ['Ajitesh', 84, 183, 'no'],
                       ['Shailesh', 79, 186, 'yes'],
                       ['Seema', 67, 158, 'yes'],
                       ['Nidhi', 52, 155, 'no'],
                       ['Ajitesh', 84, 183, 'no'],
                       ['Shailesh', 79, 186, 'yes'],
                       ['Seema', 67, 158, 'yes'],
                       ['Nidhi', 52, 155, 'no'],
                       ['Ajitesh', 84, 183, 'no'],
                       ['Shailesh', 79, 186, 'yes'],
                       ['Seema', 67, 158, 'yes'],
                       ['Nidhi', 52, 155, 'no']])
    return df


def write_the_data(tablename, collected_data):
    collected_data.to_sql(name=tablename, con=engine)

snapshot1 = tracemalloc.take_snapshot()
for thing in things:
    while do_count < 5000:
        df = get_some_data()
        write_the_data(tablename = thing, collected_data = df)
        do_count += 1
        print("Biggest DIFF: " + str(tracemalloc.take_snapshot().compare_to(snapshot1, 'lineno')[0]))

Memory leak on Python SQLAlchemy?

Answers (1)

Related Questions