isar
isar

Reputation: 1791

Query excluding duplicates in Django

I'm using distinct() QuerySet to get some data in Django.
My initial query was Point.objects.order_by('chron', 'pubdate').
The field chron in some cases is a duplicate so I changed the query to Point.objects.order_by('chron', 'pubdate').distinct('chron') in order to exclude duplicates. Now the problem is that all empty fields are considered duplicates.

To be accurate, the chron field contain integers (which behave similar to ids), in some cases it can be a duplicate, in some cases it can be NULL.

| chron |
|-------|
| 1     | I want this
| 2     | I want this
| 3     | I want this
| 3     |
| NULL  |
| 4     | I want this
| NULL  |

I want to exclude all the chron duplicates but not if they are duplicate of NULL. Thank you.

Upvotes: 0

Views: 115

Answers (1)

matthew.
matthew.

Reputation: 176

Use two separate queries.

  • .distinct("chron").exclude(chron__isnull=True)

  • .filter() for only chron values where chron__isnull=True.

Although this seems pretty inefficient I believe (I will happily be corrected) that even any sensible vanilla SQL statement (eg. below) would require multiple table scans to join a result set of nulls and unique values.

SELECT *
FROM (
    SELECT chron
    FROM Point
    WHERE chron IS NOT NULL  # .exclude()
    GROUP BY chron  # .distinct()

    UNION ALL

    SELECT chron
    FROM Point
    WHERE chron IS NULL  # .include()
)

Upvotes: 2

Related Questions