Reputation: 11
For our Django web server we have quite limited resources which means we have to be careful with the amount of memory we use. One part of our web server is a crom job (using celery and rabbitmq) that parses a ~130MB csv file into our postgres database. The csv file is saved to disk and then read using the csv module from python, reading row by row. Because the csv file is basically a feed, we use the bulk_upsert
from the custom postgres manager from django-postgres-extra
to upsert our data and override existing entries. Recently we started experiencing memory errors and we eventually found out they were caused by Django.
Running mem_top()
showed us that Django was storing massive upsert queries(INSERT ... ON CONFLICT DO
) including their metadata, in memory. Each bulk_upsert
of 15000 rows would add 40MB memory used by python, leading to a total of 1GB memory used when the job would finish as we upsert 750.000 rows in total. Apparently Django does not release the query from memory after it's finished. Running the crom job without the upsert call would lead to a max memory usage of 80MB, of which 60MB is default for celery.
We tried running gc.collect()
and django.db.reset_queries()
but the queries are still stored in memory. Our Debug
setting is set to false and CONN_MAX_AGE
is also not set. Currently we're out clues for where to look to fix this issue, we can't run our crom jobs now. Do you know of any last resorts to try to resolve this issue?
Some more meta info regarding our server:
django==2.1.3
django-elasticsearch-dsl==0.5.1
elasticsearch-dsl==6.1.0
psycopg2-binary==2.7.5
gunicorn==19.9.0
celery==4.3.0
django-celery-beat==1.5.0
django-postgres-extra==1.22
Thank you very much in advance!
Upvotes: 0
Views: 475
Reputation: 11
Today I've found the solutions for our issues so I thought it'd be great to share. It turned out that the issue was a combination of Django and Sentry (which we only use on our production server). Django would log the query and Sentry would then catch this log and keep it in memory for some reason. As each raw SQL query was about 40MB this ate a lot of memory. Currently, we turned Sentry off on our crom job server and are looking into a way to clear the logs kept by sentry.
Upvotes: 1