ajwood
ajwood

Reputation: 19027

django filtering vs python filtering with prefetched objects

I'm trying to optimize some Django code, and I've got two similar approach that are performing differetly. Here are some example models:

class A(models.Model):
    name = models.CharField(max_length=100)

class B(models.Model):
    name = models.CharField(max_length=100)
    a = models.ForeignKey(A)
    c = models.ForeignKey(C)

class C(models.Model):
    name = models.CharField(max_length=100)

For each A object, I'd like to iterate over a subset of its incoming B's, filtered on the their c value. Simple:

for a in A.objects.all() :
    for b in a.B_set.filter( c__name='some_val' ) :
        print a.name, b.name, c.name

The problem with this is that there is a new database lookup for every a value iterated over.

It seems that the solution is to prefetch the c values which will feed into the filter.

qs_A = A.objects.all().prefetch_related('B_set__c')

Now consider the following two filter approaches:

# Django filter
for a in qs_A :
    for b in a.B_set.filter( c__name='some_val' ) :
        print a.name, b.name, n.name

# Python filter
for a in qs_A :
    for b in filter( lambda b: b.c.name == 'some_val', a.B_set.all() ):
        print a.name, b.name, c.name

With the data I'm using, the django filter makes 48 more SQL queries than the python filter (on a 12-element qs_A result set). This makes me believe that the django filter doesn't make use of the prefetched tables.

Could someone explain what is happened?

Perhaps it's possible to apply the filter during the prefetch?

Upvotes: 1

Views: 383

Answers (1)

Bernhard Vallant
Bernhard Vallant

Reputation: 50776

Prefetch and filtering don't have any direct connection... The filtering always happens inside your database, whereas prefetch_related's main purpose is to get data for related objects when outputting them or something similar.

Lesser SQL queries are mostly better, but if you want to optimize your use case you should perform some benchmarking and profiling and not rely on some general statements!

You could probably make your example more efficient if you wouldn't work with A in the first place but with B instead:

qs = B.objects.select_related('a', 'c').filter(c__name='some val')
# maybe you need some filtering for a as well:
# qs = qs.filter(a__x=....)
for b in qs:
    print b.a.name, b.name, b.c.name

Maybe you'll need to do some regrouping/ordering after filtering (in python) but if you can already perform all the filtering action in one step it'll be more efficient... Otherwise maybe look at raw sql queries...

Upvotes: 1

Related Questions