Reputation: 411012
I have a set of Django models as shown in the following diagram (the names of the reverse relationships are shown in the yellow bubbles):
(source: cbstaff.com)
In each relationship, a Person
may have 0 or more of the items.
Additionally, the slug
field is (unfortunately) not unique; multiple Person
records may have the same slug fields. Essentially these records are duplicates.
I want to obtain a list of all records that meet the following criteria: All duplicate records (that is, having the same slug) with at least one Entry
OR at least one Audio
OR at least one Episode
OR at least one Article
.
So far, I have the following query:
Person.objects.values('slug').annotate(num_records=Count('slug')).filter(num_records__gt=1)
This groups all records by slug
, then adds a num_records
attribute that says how many records have that slug, but the additional filtering is not performed (and I don't even know if this would work right anyway, since, given a set of duplicate records, one may have, e.g., and Entry
and the other may have an Article
).
In a nutshell, I want to find all duplicate records and collapse them, along with their associated models, into one record.
What's the best way to do this with Django?
Upvotes: 9
Views: 1443
Reputation: 6576
I would do this in several queries. The first is your list of duplicates, that you have:
dupes = [p['slug'] for p in Person.objects.values('slug').annotate(num_records=Count('slug')).filter(num_records__gt=1)]
I would then loop through these, and for each one decide on which to keep (make an arbitrary decision - pick the first one). Then, for all the other primary keys, just update all the other objects to point to the primary key you have selected:
for slug in dupes:
pks = [p.id for p in Person.objects.filter(slug=slug)]
for pk in pks[1:]:
Audio.objects.filter(person=pk).update(person=pks[0])
Author.objects.filter(person=pk).update(person=pks[0])
Episode.objects.filter(person=pk).update(person=pks[0])
Entry.objects.filter(person=pk).update(person=pks[0])
Upvotes: 1
Reputation: 842
I'm not sure that chaining filters is going to get you where you want to go, since there will be Person entries that have two or more types of artifacts with their name on it. Take a look at a previous StackOverflow question and its answer, which I think will help you combine four queries into a single QuerySet in the way you want:
How to combine 2 or more querysets in a Django view?
Upvotes: 0
Reputation: 2528
Did you look into Django aggregration for 'group by' like behaviour?
http://docs.djangoproject.com/en/dev/topics/db/aggregation/
Upvotes: 0