Reputation: 2523
I want to iterate all the objects of a table(Post) I am using below code:
posts = Post.objects.all()
for post in posts:
process_post(post)
process_post
is a celery task which will run in background and its not updating post.But the problem I am having is Post table has 1 million records.This is not one time job.I am running it daily.
for post in posts
In above line, Query is called which fetches all the data from DB in one go.
How can I improve its performance? Is there any way by which data is fetched in batches?
Upvotes: 5
Views: 5135
Reputation: 1
Django is not used for process the data. this is just framework for making API and ORM for frontend.
you can limit process depending on your memory and database like obj = post.objects.all()[30000] etc or 50000
if you want to show in HTML front end use pagination
if you want process on backend don't use Django ORM. in database make materlized view and database job (this is very easy in oracle database)
Upvotes: 0
Reputation: 20339
Make your own iterator
. For Example, say 1 million
records.
count = Post.objects.all().count() #1 million
chunk_size = 1000
for i in range(0, count, chunk_size):
posts = Post.objects.all()[i:i+chunk_size]
for post in posts:
process_post(post)
Slicing on queryset will play LIMIT
, OFFSET
usages. Query can decrease as per chunk_size
increase where as memory usage also increase. Optimize it for your use case.
Upvotes: 10
Reputation: 1216
My First Suggestion would be use select_related or prefetch_related. Go through the documentation of django and learn about it, It should fix your problem. But as you have said that you have some millions of records for that table. Iterating through those will always be a costly business. The best solution is to go for stored procedure if the process_post method is taking time. You can achieve your goal with only one request to your db instead of millions of db calls in the loop.
Upvotes: 3