Reputation: 19027
I'm trying to optimize some Django code, and I've got two similar approach that are performing differetly. Here are some example models:
class A(models.Model):
name = models.CharField(max_length=100)
class B(models.Model):
name = models.CharField(max_length=100)
a = models.ForeignKey(A)
c = models.ForeignKey(C)
class C(models.Model):
name = models.CharField(max_length=100)
For each A
object, I'd like to iterate over a subset of its incoming B
's, filtered on the their c
value. Simple:
for a in A.objects.all() :
for b in a.B_set.filter( c__name='some_val' ) :
print a.name, b.name, c.name
The problem with this is that there is a new database lookup for every a
value iterated over.
It seems that the solution is to prefetch the c values which will feed into the filter.
qs_A = A.objects.all().prefetch_related('B_set__c')
Now consider the following two filter approaches:
# Django filter
for a in qs_A :
for b in a.B_set.filter( c__name='some_val' ) :
print a.name, b.name, n.name
# Python filter
for a in qs_A :
for b in filter( lambda b: b.c.name == 'some_val', a.B_set.all() ):
print a.name, b.name, c.name
With the data I'm using, the django filter makes 48 more SQL queries than the python filter (on a 12-element qs_A
result set). This makes me believe that the django filter doesn't make use of the prefetched tables.
Could someone explain what is happened?
Perhaps it's possible to apply the filter during the prefetch?
Upvotes: 1
Views: 383
Reputation: 50776
Prefetch and filtering don't have any direct connection... The filtering always happens inside your database, whereas prefetch_related
's main purpose is to get data for related objects when outputting them or something similar.
Lesser SQL queries are mostly better, but if you want to optimize your use case you should perform some benchmarking and profiling and not rely on some general statements!
You could probably make your example more efficient if you wouldn't work with A
in the first place but with B
instead:
qs = B.objects.select_related('a', 'c').filter(c__name='some val')
# maybe you need some filtering for a as well:
# qs = qs.filter(a__x=....)
for b in qs:
print b.a.name, b.name, b.c.name
Maybe you'll need to do some regrouping/ordering after filtering (in python) but if you can already perform all the filtering action in one step it'll be more efficient... Otherwise maybe look at raw sql queries...
Upvotes: 1