rjd
rjd

Reputation: 31

Django filter() Increases Result QuerySet Rather than Reducing

In testing my Django application I ran across a performance issue with a particular query when a particular filter() was used. The issue was solved, but I don't understand why.

I'm filtering a QuerySet based on 3 criteria. When including them all at once, I get the expected number of results. When moving one of the clauses to its own filter() call, the resulting QuerySet is much, much larger than it should be and is filled with seemingly duplicate entries.

See below for an illustration of this. Moving the result__is_false=False clause to its own filter() causes a huge expansion in the resulting QuerySet.

>>> Finding.objects.filter(project=1, result__scan_session__is_enabled=True,\ 
    result__is_false=False).count()
3566

>>> Finding.objects.filter(project=1, result__scan_session__is_enabled=True)\
    .filter(result__is_false=False).count()
10050380

The problem was that I needed to conditionally apply the filter in question, not every time. I solved the problem by converting the filter() routine to an exclude() routine like this:

>>> Finding.objects.filter(project=1, result__scan_session__is_enabled=True)\
    .exclude(result__is_false=True).count()
3566

That seemed to have done the trick!

While I'm happy that this worked, I don't understand two things:

1) Why posts explaining filter() and exclude() seem to suggest that they can be treated as equal but opposite, when based on the example above, that is not always the case.

2) Why moving the result__is_false=False clause to its own filter() caused the huge jump in results in the first place.

Does anyone have any insight into either of these?

Upvotes: 3

Views: 446

Answers (1)

Peter DeGlopper
Peter DeGlopper

Reputation: 37319

Both of these look like the expected result of how successive filter calls work on multivalued relationships. The docs explain in detail, with examples:

Everything inside a single filter() call is applied simultaneously to filter out items matching all those requirements. Successive filter() calls further restrict the set of objects, but for multi-valued relations, they apply to any object linked to the primary model, not necessarily those objects that were selected by an earlier filter() call.

So your second filter call returns any Finding where there's at least one non-false result. The exclude call returns any finding where there's not at least one false result. They're interchangeable when querying non-nullable fields on single values, but not in all cases.

Upvotes: 3

Related Questions