Andrew Gorcester
Andrew Gorcester

Reputation: 19973

Django queryset exclude() with multiple related field clauses

I am creating a sparse preferences table in Django. My model is simple:

class Preference(models.Model):
    user = models.ForeignKey(settings.AUTH_USER_MODEL, related_name='preferences')
    preference = models.CharField(max_length=255, db_index=True)
    value = models.BooleanField()

Some preferences have default states, so I need to be able to ask the database two questions: "Which users have this preference set to a certain value?" and "Which users do not have this preference set to that value (either because they do not have the preference set or because they have proactively set the preference to another value)?"

My problem is that the former question works, but the latter question (the same query clauses, but with an exclude() instead of a filter()) does not work. For instance:

My test database has 14 users, and a single user has two preferences set: 'PREF_A' is set to True and 'PREF_B' is set to False.

>>> User.objects.all().count()
14
>>> User.objects.filter(preferences__preference="PREF_A", preferences__value=True).count()
1
>>> User.objects.exclude(preferences__preference="PREF_A", preferences__value=True).count()
13
>>> User.objects.filter(preferences__preference="PREF_A", preferences__value=False).count()
0
>>> User.objects.exclude(preferences__preference="PREF_A", preferences__value=False).count()
13

So, my results say that:

There are 14 users in total

Where is this query going wrong, and how can I write the query to properly exclude people who have a specific preference set to a specific value?

I have tried using Q and ~Q to see if the behavior would be different, but the results were the same.

Upvotes: 18

Views: 17066

Answers (3)

M Somerville
M Somerville

Reputation: 4610

This is a still-present gotcha in Django, where exclude() is not acting as the reverse of filter(). Here's the documentation explaining the difference:

Note

The behavior of filter() for queries that span multi-value relationships, as described above, is not implemented equivalently for exclude(). Instead, the conditions in a single exclude() call will not necessarily refer to the same item.

For example, the following query would exclude blogs that contain both entries with “Lennon” in the headline and entries published in 2008:

Blog.objects.exclude(
    entry__headline__contains='Lennon',
    entry__pub_date__year=2008,
)

However, unlike the behavior when using filter(), this will not limit blogs based on entries that satisfy both conditions. In order to do that, i.e. to select all blogs that do not contain entries published with “Lennon” that were published in 2008, you need to make two queries:

Blog.objects.exclude(
    entry__in=Entry.objects.filter(
        headline__contains='Lennon',
        pub_date__year=2008,
    ),
)

What you've done is probably the way to go.

Upvotes: 33

Patrik Beck
Patrik Beck

Reputation: 2505

You can use the new django SubQuery to avoid doing 2 queries to the server:

User.objects.exclude(id__in=SubQuery(User.objects.filter(preferences__preference="PREF_A", preferences__value=True)))

Upvotes: 2

Andrew Gorcester
Andrew Gorcester

Reputation: 19973

I implemented a quick-and-dirty solution so I could move on, expecting it to be horribly inefficient; however, on inspection of the SQL generated, it turned out to not be that bad:

>>> User.objects.exclude(id__in=User.objects.filter(preferences__preference="PREF_A", preferences__value=True))

I thought that the ORM would load the results of the subordinate query into the webserver's memory before completing (this is a problem because our production app will have in the millions of users), but in fact it properly uses a subquery:

>>> User.objects.exclude(id__in=User.objects.filter(preferences__preference="PREF_A", preferences__value=True)).values('id').query.sql_with_params()
(u'SELECT "sgauth_user"."id" FROM "sgauth_user" WHERE NOT ("sgauth_user"."id" IN (SELECT U0."id" FROM "sgauth_user" U0 INNER JOIN "feeds_preference" U1 ON (U0."id" = U1."user_id") WHERE (U1."preference" = %s  AND U1."value" = %s )))', ('PREF_A', True))

I'm putting this up as one possible answer, but I am still interested if there is a way to do it with the straightforward exclude clause, or a way of generating a query through the ORM that works with straightforward joins and no subquery whatsoever.

Upvotes: 6

Related Questions