techie11
techie11

Reputation: 1387

Self join in Django

To perform a query with self-join on the following table:

> select * from test_users
id  email
--- -----------
1   "[email protected]"
2   "[email protected]"
3   "[email protected]"

I can use SQL:

> select u1.id u1id, u2.id u2id from test_users u1 inner join test_users u2 on u1.email=u2.email and u1.id !=u2.id
u1id   u2id
-----  ------
1      2
2      1

Question:

  1. How can I write this in Django ORM?
  2. If I want to remove the duplicate so that I get only 1 row in above example, how can I achieve that in Django?

Upvotes: 3

Views: 533

Answers (1)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476534

You can retrieve the Users for which there is another item with a primary key that is greater than (or less than) the primary key of the user with an Exists subquery [Django-doc]:

from django.db.models import Exists, OuterRef

Users.objects.filter(
    Exists(
        User.objects.filter(
            pk__gt=OuterRef('pk'),
            email=OuterRef('email')
        )
    )
)

If you thus call .delete() on these, you will remove all Users for wich another Users object exists with a greater primary key.

Prior to , one should move the Exists subquery to an .annotate(…) clause, and then filter on this:

from django.db.models import Exists, OuterRef

Users.objects.annotate(
    has_other=Exists(
        User.objects.filter(
            pk__gt=OuterRef('pk'),
            email=OuterRef('email')
        )
    )
).filter(has_other=True)

Upvotes: 1

Related Questions