Himanshu dua
Himanshu dua

Reputation: 2523

Get million record from django with queryset is slow

I want to iterate all the objects of a table(Post) I am using below code:

posts = Post.objects.all()
for post in posts:
   process_post(post)

process_post is a celery task which will run in background and its not updating post.But the problem I am having is Post table has 1 million records.This is not one time job.I am running it daily.

for post in posts

In above line, Query is called which fetches all the data from DB in one go.

How can I improve its performance? Is there any way by which data is fetched in batches?

Upvotes: 5

Views: 5135

Answers (3)

Asim Ali
Asim Ali

Reputation: 1

Django is not used for process the data. this is just framework for making API and ORM for frontend.

you can limit process depending on your memory and database like obj = post.objects.all()[30000] etc or 50000

if you want to show in HTML front end use pagination

if you want process on backend don't use Django ORM. in database make materlized view and database job (this is very easy in oracle database)

Upvotes: 0

itzMEonTV
itzMEonTV

Reputation: 20339

Make your own iterator. For Example, say 1 million records.

count = Post.objects.all().count() #1 million
chunk_size = 1000   
for i in range(0, count, chunk_size):
    posts = Post.objects.all()[i:i+chunk_size]
    for post in posts:
        process_post(post)        

Slicing on queryset will play LIMIT, OFFSET usages. Query can decrease as per chunk_size increase where as memory usage also increase. Optimize it for your use case.

Upvotes: 10

binu.py
binu.py

Reputation: 1216

My First Suggestion would be use select_related or prefetch_related. Go through the documentation of django and learn about it, It should fix your problem. But as you have said that you have some millions of records for that table. Iterating through those will always be a costly business. The best solution is to go for stored procedure if the process_post method is taking time. You can achieve your goal with only one request to your db instead of millions of db calls in the loop.

Upvotes: 3

Related Questions