KnewB
KnewB

Reputation: 363

Filtering an evaluated QuerySet

I am evaluating one QuerySet, and then another, but the second is a subset of the first. I'm trying to do this in an efficient way, with as few database calls as possible. (This has been asked before but to be honest I didn't totally understand the answer and I'm not sure they completely apply to what I'm thinking about.)

Using the Django doc's example Weblog models, in a view, this is the code before trying to optimise:

myblog = Blog.objects.get(pk=1)

d={} #going to pass this to my template

# not using count()
d['num_of_entries'] = len(myblog.entry_set.all()) 

# not using exists()
d['is_jolly'] = bool(Entry.objects.filter(blog=myblog, headline__startswith='Jolly'))

# ... other code but no further use of database in this view

The second QuerySet is a subset of the first. Should I try to use pure Python to get the subset (and so only evaluate one QuerySet -one less database call)?

Or, perhaps simply do the following?

# other code as above

d['num_of_entries'] = myblog.entry_set.count()

d['is_jolly'] = Entry.objects.filter(blog=myblog, headline__startswith='Jolly').exists()

# ... other code but no further use of database in this view

Upvotes: 3

Views: 1185

Answers (1)

Gareth Rees
Gareth Rees

Reputation: 65854

"As few database queries as possible" isn't the right criterion. You want to also think about the amount of work done by those database queries, and the amount of data that's transferred from the database to your Django server.

Let's consider the two ways to implement what you're after. Approach 1:

entries = myblog.entry_set.all()
num_of_entries = len(entries)
is_jolly = any(e.headline.startswith('Jolly') for e in entries)

Approach 2:

num_of_entries = myblog.entry_set.count()
is_jolly = Entry.objects.filter(blog=myblog, headline__startswith='Jolly').exists()

In approach 1 there's a single database query, but that query going to be something like SELECT * FROM ENTRY WHERE .... It potentially fetches a large number of entries, and transmits all their content across the network to your Django server, which then throws nearly all that content away (the only field it actually looks at is the headline field).

In approach 2 there are two database queries, but the first one is a SELECT COUNT(*) FROM ENTRY WHERE ... which fetches just one integer, and the second is a SELECT EXISTS(...) which fetches just one Boolean. Since these queries don't have to fetch all the matching entries, there are a lot more possibilities for the database to optimize the queries.

So approach 2 looks much better in this case, even though it issues more queries.

Upvotes: 7

Related Questions