Reputation: 6164
I apologize for formatting eyesores in advance. Feel free to edit my question for better readability.
I have four models:
class Datasets(models.Model):
name = models.CharField(max_length=150)
description = models.TextField()
class Assay(models.Model):
dataset = models.ForeignKey(Datasets)
name = models.CharField(max_length=150)
type = models.CharField(max_length=150)
class Compounds(models.Model):
dataset = models.ForeignKey(Datasets)
name = models.TextField()
deleted = models.BooleanField(default=False)
class Assays(models.Model):
compound = models.ForeignKey(Compounds)
assay = models.ForeignKey(Assay)
value = models.DecimalField(max_digits=30, decimal_places=16)
deleted = models.BooleanField(default=False)
I am building queries with user input depending on the Assay
picked. I am using JOIN
s to filter results based on reverse relations. User picks the Assay
and I filter the compounds based on the selection. Users can also pick 'No Assay' option, which should return the compounds with no registered assays (i.e. no entries in Assays
model for that compound).
selected_assay_id = 5 # Received from frontend
no_assay_option_selected = True/False # Received from frontend
dataset_id = 1
filter_query = Q()
filter_query.add(Q(**{
'assays__assay__id': selected_assay_id,
'assays__compound__id': F('id'),
'assays__deleted': False
}), Q.OR)
if no_assay_option_selected:
filter_query.add(~Q(**{
'assays__deleted': False,
'assays__compound__id': F('id')
}), Q.OR)
compounds = Compounds.objects.filter(filter_query, dataset__id=dataset_id).distinct()
When I pick an assay, it works great. When I pick 'No Assay', it works great. But when I pick an assay and also 'No Assay', all the compounds are returned instead of the ones with the picked assay and 'no assay'. When I checked the raw SQL queries, I realized latter query has an additional piece:
-- Only 'No Assay' option selected
SELECT DISTINCT * FROM "compounds" WHERE (
NOT ("compounds"."id" IN
(SELECT U1."compound_id" FROM "compounds" U0
INNER JOIN "assays" U1 ON (U0."id" = U1."compound_id")
WHERE U1."compound_id" = (U0."id")) AND "compounds"."id" IN
(SELECT U1."compound_id" FROM "assays" U1
WHERE U1."deleted" = False)
)
AND "compounds"."dataset_id" = 1 AND "compounds"."deleted" = False
)
-- An assay is selected
SELECT DISTINCT * FROM "compounds"
INNER JOIN "assays" ON ("compounds"."id" = "assays"."compound_id")
WHERE (
"assays"."assay_id" = 5
AND "assays"."compound_id" = ("compounds"."id")
AND "assays"."deleted" = False
AND "compounds"."dataset_id" = 1
AND "compounds"."deleted" = False
)
-- An assay and 'No Assay' option selected
SELECT DISTINCT * FROM "compounds"
LEFT OUTER JOIN "assays" ON ("compounds"."id" = "assays"."compound_id")
WHERE (
(
(
"assays"."assay_id" = 5
AND "assays"."compound_id" = ("compounds"."id")
AND "assays"."deleted" = False
) OR NOT (
"compounds"."id" IN
(SELECT U1."compound_id" FROM "compounds" U0
INNER JOIN "assays" U1 ON (U0."id" = U1."compound_id")
WHERE (U1."compound_id" = (U0."id") AND U1."id" = ("assays"."id"))
) AND "compounds"."id" IN
(SELECT U1."compound_id" FROM "assays" U1
WHERE (U1."deleted" = False AND U1."id" = ("assays"."id"))
)
)
)
AND "compounds"."dataset_id" = 1 AND "compounds"."deleted" = False
)
This is the extra piece in last query: AND U1."id" = ("assays"."id")
that causes funky results. When I remove it and run the raw query, I get the desired results.
My question is: why is Django doing this and how can I fix it?
Upvotes: 0
Views: 55
Reputation: 2434
You may have stumbled onto an actual Django bug here. The behaviour persists even when doing an INNER JOIN
. If you modify your filter contructor in the no_assay_option_selected
if block to use a Q.AND
(even though it doesn't make logical sense to make such a query), you'll find that an INNER JOIN
query is generated, with the AND U1."id" = ("assays"."id")
statement still present.
There is a workaround you can use, however:
filter_query = Q(
assays__assay__id=selected_assay_id,
assays__deleted=False
)
if no_assay_option_selected:
filter_query |= ~Q(
id__in=Assays.objects.filter(deleted=False).values_list('compound_id'))
)
compounds = Compounds.objects.filter(filter_query, dataset__id=dataset_id).distinct()
Notice I've also removed your use of 'assays__compound__id': F('id')
in constructing the queries. It's unnecessary, since that's the condition that the join is made ON
, so tacking it on as a WHERE
does nothing useful.
The SQL that gets generated by the above should be:
SELECT DISTINCT * FROM "compounds"
LEFT OUTER JOIN "assays" ON ("compounds"."id" = "assays"."compound_id")
WHERE (
(
(
"assays"."assay_id" = 5
AND "assays"."deleted" = False
) OR NOT (
"compounds"."id" IN
(SELECT U0."compound_id" FROM "assays" U0 WHERE U0."deleted" = false)
)
)
AND "compounds"."dataset_id" = 1 AND "compounds"."deleted" = False
)
Which is what you want, as far as I can tell. The join still needs to be a LEFT OUTER
to include compounds without assays.
Upvotes: 1
Reputation: 6164
I figured Django ORM prefers using OUTER JOIN
over INNER JOIN
even when inner join is exactly what you want, like in my case. Luckily there's a way to force ORM using inner join:
Compounds.objects.filter(assays__isnull=False).filter(filter_query, dataset__id=dataset_id).distinct()
The first filter tells using INNER JOIN
is safe (Reference).
Upvotes: 0