user2937998
user2937998

Reputation: 367

Django: remove duplicates (group by) from queryset by related model field

I have a Queryset with a couple of records, and I wan't to remove duplicates using the related model field. For example:

class User(models.Model):
    group = models.ForeignKey('Group')
    ...

class Address(models.Model):
    ...
    models.ForeignKey('User')

addresses = Address.objects.filter(user__group__id=1).order_by('-id')

This returns a QuerySet of Address records, and I want to group by the User ID.

I could:

addresses = Address.objects.filter(user__group__id=1).order_by('-id')
unique_users_ids = []
unique_addresses = []

for address in addresses:
    if address.user.id not in unique_users_ids:
        unique_addresses.append(address)
        unique_users_ids.append(address.user.id)

print unique_addresses # TA-DA!

But it seems too much for a simple thing like a group by (damn you Django).

Is there a easy way to achieve this?

Upvotes: 3

Views: 5959

Answers (1)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476709

By using .distinct() with a field name

Django has also a .distinct(..) function that takes as input column the column names that should be unique. Alas most database systems do not support this (only PostgreSQL to the best of my knowledge). But in PostgreSQL we can thus perform:

# Limited number of database systems support this
addresses = (Address.objects
                    .filter(user__group__id=1)
                    .order_by('-id')
                    .distinct('user_id'))

By using two queries

Another way to handle this is by first having a query that works over the users, and for each user obtains the largest address_id:

from django.db.models import Max

address_ids = (User.objects
                   .annotate(address_id=Max('address_set__id'))
                   .filter(address_id__isnull=False)
                   .values_list('address_id'))

So now for every user, we have calculated the largest corresponding address_id, and we eliminate Users that have no address. We then obtain the list of ids.

In a second step, we then fetch the addresses:

addresses = Address.objects.filter(pk__in=address_ids)

Upvotes: 7

Related Questions