Reputation: 31
In this sample code, I track the memory usage every while loop.
In the first 50 loops, the 'Biggest DIFF' is C:\Program Files\Python39\lib\tracemalloc.py:115
so that's not interesting (also it's not in my real problem). After 50 loops, %userprofile%\AppData\Roaming\Python\Python39\site-packages\sqlalchemy\sql\elements.py:1551
takes over as biggest memory change, with the size growing every loop with 5KiB and the count growing 5 with every loop. That's the size of the df in this example.
The while loop was meant to have no memory leak, as you reassign df
every time, then write it, then reassign, then write, and so on, and so on. But it leaks anyway, through SA.
Of course this is an example, I encountered this with a df.shape of [1000000,100] per loop. I don't care about 300 KiB.
What am I not understanding?
import pandas as pd
import sqlalchemy as sa
import tracemalloc
tracemalloc.start(100)
engine = sa.create_engine(
'mssql+pyodbc://' + UN + ':' + PW + '@' + HOST + '/' + DB +"?driver=ODBC Driver 17 for SQL Server&Encrypt=Yes",
pool_pre_ping=True,
fast_executemany=True)
things = ['a', 'b', 'c']
do_count = 0
def get_some_data():
df = pd.DataFrame([['Ajitesh', 84, 183, 'no'],
['Shailesh', 79, 186, 'yes'],
['Seema', 67, 158, 'yes'],
['Nidhi', 52, 155, 'no'],
['Ajitesh', 84, 183, 'no'],
['Shailesh', 79, 186, 'yes'],
['Seema', 67, 158, 'yes'],
['Nidhi', 52, 155, 'no'],
['Ajitesh', 84, 183, 'no'],
['Shailesh', 79, 186, 'yes'],
['Seema', 67, 158, 'yes'],
['Nidhi', 52, 155, 'no'],
['Ajitesh', 84, 183, 'no'],
['Shailesh', 79, 186, 'yes'],
['Seema', 67, 158, 'yes'],
['Nidhi', 52, 155, 'no'],
['Ajitesh', 84, 183, 'no'],
['Shailesh', 79, 186, 'yes'],
['Seema', 67, 158, 'yes'],
['Nidhi', 52, 155, 'no'],
['Ajitesh', 84, 183, 'no'],
['Shailesh', 79, 186, 'yes'],
['Seema', 67, 158, 'yes'],
['Nidhi', 52, 155, 'no'],
['Ajitesh', 84, 183, 'no'],
['Shailesh', 79, 186, 'yes'],
['Seema', 67, 158, 'yes'],
['Nidhi', 52, 155, 'no']])
return df
def write_the_data(tablename, collected_data):
collected_data.to_sql(name=tablename, con=engine)
snapshot1 = tracemalloc.take_snapshot()
for thing in things:
while do_count < 5000:
df = get_some_data()
write_the_data(tablename = thing, collected_data = df)
do_count += 1
print("Biggest DIFF: " + str(tracemalloc.take_snapshot().compare_to(snapshot1, 'lineno')[0]))
Upvotes: 1
Views: 1047
Reputation: 675
with
statement for SQLAlchemy connection to ensure it was properly closed.memory-profiler
to better understand your program's memory usage.Upvotes: 1