Reputation: 23
I work in a company that has a large database and I want to perform some update queries on it but it seems to cause a huge memory leak the query is as follow
c= CallLog.objects.all()
for i in c:
i.cdate = pytz.utc.localize(datetime.datetime.strptime(i.fixed_date, "%y-%m-%d %H:%M"))
i.save()
I wrote this in the interactive shell of Django
I even tried to use
with transaction.atomic()
but it didn't work, do you have any idea how can I detect the source of
the dataset I am working on is about 27 million
fixed_date is a calculated property
Upvotes: 2
Views: 1138
Reputation: 611
You could try something like this:
from django.core.paginator import Paginator
p = Paginator(CallLog.objects.all().only('cdate'), 2000)
for page in range(1, p.num_pages + 1):
for i in p.page(page).object_list:
i.cdate = pytz.utc.localize(datetime.datetime.strptime(i.fixed_date, "%y-%m-%d %H:%M"))
i.save()
Slicing a query set does not load all the objects in memory only to get a subset but adds limit and offset to the SQL query before hitting the database.
Upvotes: 1
Reputation: 2578
Try breaking it into small blocks (since you have only 4gb of ram)
c= CallLog.objects.filter(somefield=somevalue)
When its necessary, I usually use a character or number (ID enting in 1,2,3,4 etc)
Upvotes: 0
Reputation: 16515
You could try to iterate the queryset in batches; see the .iterator()
method. See if that improves anything
for obj in CallLog.objects.all():
obj.cdate = pytz.utc.localize(
datetime.datetime.strptime(obj.fixed_date, "%y-%m-%d %H:%M"))
obj.save()
Here is a related answer I found, but it is a few years old.
Upvotes: 0