NoobEditor
NoobEditor

Reputation: 15871

Fetching queryset data one by one

I am aware that regular queryset or the iterator queryset methods evaluates and returns the entire data-set in one shot .

for instance, take this :

my_objects = MyObject.objects.all()
for rows in my_objects:            # Way 1
for rows in my_objects.iterator(): # Way 2

Question

In both methods all the rows are fetched in a single-go.Is there any way in djago that the queryset rows can be fetched one by one from database.

Why this weird Requirement

At present my query fetches lets says n rows but sometime i get Python and Django OperationalError (2006, 'MySQL server has gone away').

so to have a workaround for this, i am currently using a weird while looping logic.So was wondering if there is any native or inbuilt method or is my question even logical in first place!! :)

Upvotes: 5

Views: 511

Answers (3)

e4c5
e4c5

Reputation: 53734

Fetching row by row might be worse. You might want to retrieve in batches for 1000s etc. I have used this Django snippet (not my work) successfully with very large querysets. It doesn't eat up memory and no trouble with connections going away.

Here's the snippet from that link:

import gc

def queryset_iterator(queryset, chunksize=1000):
    '''''
    Iterate over a Django Queryset ordered by the primary key

    This method loads a maximum of chunksize (default: 1000) rows in it's
    memory at the same time while django normally would load all rows in it's
    memory. Using the iterator() method only causes it to not preload all the
    classes.

    Note that the implementation of the iterator does not support ordered query sets.
    '''
    pk = 0
    last_pk = queryset.order_by('-pk')[0].pk
    queryset = queryset.order_by('pk')
    while pk < last_pk:
        for row in queryset.filter(pk__gt=pk)[:chunksize]:
            pk = row.pk
            yield row
        gc.collect()

Upvotes: 2

Amit Jaiswal
Amit Jaiswal

Reputation: 1015

To solve (2006, 'MySQL server has gone away') problem, your approach is not that logical. If you will hit database for each entry, it is going to increase number of queries which itself will create problem in future as usage of your application grows. I think you should close mysql connection after iterating all elements of result, and then if you will try to make another query, django will create a new connection.

from django.db import connection:
connection.close()

Refer this for more details

Upvotes: 1

Sayse
Sayse

Reputation: 43300

I think you are looking to limit your query set.

Quote from above link:

Use a subset of Python’s array-slicing syntax to limit your QuerySet to a certain number of results. This is the equivalent of SQL’s LIMIT and OFFSET clauses.

In other words, If you start with a count you can then loop over and take slices as you require them..

cnt = MyObject.objects.count()
start_point = 0
inc = 5
while start_point + inc < cnt:
    filtered = MyObject.objects.all()[start_point:inc]
    start_point += inc

Of course you may need to error handle this more..

Upvotes: 3

Related Questions