raj-kapil
raj-kapil

Reputation: 447

How to get all the duplicated records in django?

I have a model which contains various fields like name, start_time, start_date. I want to create an API that reads my model and displays all the duplicated records.

Upvotes: 0

Views: 580

Answers (2)

raj-kapil
raj-kapil

Reputation: 447

The above answer works, but I just wanna post my code as well so that other newbies won't have to look for other answers.

from django.db.models import Count

def find_duplicates(request):
    if request.method == 'GET':
        duplicates = Model.objects.values('field1', 'field2', 'field3')\
            .annotate(field1_count=Count('field1'),
                      field2_count=Count('field2'),
                      field3_count=Count('field3')
                      ) \
            .filter(field1_count__gt=1,
                    field2_count__gt=1,
                    field3_count__gt=1
                    )

        duplicate_objects = Model.objects.filter(field1__in=[item['field1'] for item in duplicates],
                                                         field2__in=[item['field2'] for item in duplicates],
                                                         field3__in=[item['field3'] for item in duplicates],
                                                         )
        serializer = ModelSerializer(duplicate_objects, many=True)
        return Response(serializer.data, status=status.HTTP_200_OK)

P.S. field1 is a foreign key therefore I am extracting the id here.

Upvotes: 2

PoDuck
PoDuck

Reputation: 1454

As far as I know, this needs to be done as a two step process. You find the fields that have duplicates in one query, and then gather all those objects in a second query.

from django.db.models import Count


duplicates = MyModel.objects.values('name') \
    .annotate(name_count=Count('id')) \
    .filter(name_count__gt=1)
duplicate_objects = MyModel.objects.filter(name__in=[item['name'] for item in duplicates])

duplicates will contain the names that have more than one occurrence, while duplicate_objects will contain all duplicate named objects.

Upvotes: 1

Related Questions