JasperKPI
JasperKPI

Reputation: 31

Memory leak on Python SQLAlchemy?

In this sample code, I track the memory usage every while loop.

In the first 50 loops, the 'Biggest DIFF' is C:\Program Files\Python39\lib\tracemalloc.py:115 so that's not interesting (also it's not in my real problem). After 50 loops, %userprofile%\AppData\Roaming\Python\Python39\site-packages\sqlalchemy\sql\elements.py:1551 takes over as biggest memory change, with the size growing every loop with 5KiB and the count growing 5 with every loop. That's the size of the df in this example.

The while loop was meant to have no memory leak, as you reassign df every time, then write it, then reassign, then write, and so on, and so on. But it leaks anyway, through SA.

Of course this is an example, I encountered this with a df.shape of [1000000,100] per loop. I don't care about 300 KiB.

What am I not understanding?

import pandas as pd
import sqlalchemy as sa
import tracemalloc

tracemalloc.start(100)

engine = sa.create_engine(
    'mssql+pyodbc://' + UN + ':' + PW + '@' + HOST + '/' + DB +"?driver=ODBC Driver 17 for SQL Server&Encrypt=Yes",
    pool_pre_ping=True,
    fast_executemany=True)

things = ['a', 'b', 'c']
do_count = 0

def get_some_data():
    df = pd.DataFrame([['Ajitesh', 84, 183, 'no'],
                       ['Shailesh', 79, 186, 'yes'],
                       ['Seema', 67, 158, 'yes'],
                       ['Nidhi', 52, 155, 'no'],
                       ['Ajitesh', 84, 183, 'no'],
                       ['Shailesh', 79, 186, 'yes'],
                       ['Seema', 67, 158, 'yes'],
                       ['Nidhi', 52, 155, 'no'],
                       ['Ajitesh', 84, 183, 'no'],
                       ['Shailesh', 79, 186, 'yes'],
                       ['Seema', 67, 158, 'yes'],
                       ['Nidhi', 52, 155, 'no'],
                       ['Ajitesh', 84, 183, 'no'],
                       ['Shailesh', 79, 186, 'yes'],
                       ['Seema', 67, 158, 'yes'],
                       ['Nidhi', 52, 155, 'no'],
                       ['Ajitesh', 84, 183, 'no'],
                       ['Shailesh', 79, 186, 'yes'],
                       ['Seema', 67, 158, 'yes'],
                       ['Nidhi', 52, 155, 'no'],
                       ['Ajitesh', 84, 183, 'no'],
                       ['Shailesh', 79, 186, 'yes'],
                       ['Seema', 67, 158, 'yes'],
                       ['Nidhi', 52, 155, 'no'],
                       ['Ajitesh', 84, 183, 'no'],
                       ['Shailesh', 79, 186, 'yes'],
                       ['Seema', 67, 158, 'yes'],
                       ['Nidhi', 52, 155, 'no']])
    return df


def write_the_data(tablename, collected_data):
    collected_data.to_sql(name=tablename, con=engine)

snapshot1 = tracemalloc.take_snapshot()
for thing in things:
    while do_count < 5000:
        df = get_some_data()
        write_the_data(tablename = thing, collected_data = df)
        do_count += 1
        print("Biggest DIFF: " + str(tracemalloc.take_snapshot().compare_to(snapshot1, 'lineno')[0]))

Upvotes: 1

Views: 1047

Answers (1)

Baktybek Baiserkeev
Baktybek Baiserkeev

Reputation: 675

  1. Use with statement for SQLAlchemy connection to ensure it was properly closed.
  2. Regularly flush or close SQLAlchemy sessions after their work is done.
  3. Use more robust memory profiling tool like memory-profiler to better understand your program's memory usage.

Upvotes: 1

Related Questions