RNC
RNC

Reputation: 107

Merge two Django query sets and deduplicate objects sharing a common value

I have a Django project using a segmentation library and need to merge two query sets and deduplicate the resulting query set, which is causing me to scratch my head over how to do so.

not_segments = Page.objects.all().no_segments() (paraphrased) gives me pages with segment pages excluded.

only_segments = Segment.objects.get_queryset().for_user(user=user) (paraphrased) gives me segmented page objects from the same model, but of course there are overlaps.

not_segments = Page 1, Page 2, Page 3, Page 4
only_segments = Page 2 (for user), Page 4 (for user)

So let’s say there’s a guid field in the model which is not enforced as unique, but rather identical in value between a root page and its segment child page. How do I compare the objects of the two query sets when merging them and omit objects from not_segments if an object with the same guid exists in only_segments?

To get at the desired result of queryset = Page 1, Page 2 (for user), Page 3, Page 4 (for user)

Upvotes: 1

Views: 785

Answers (1)

Niel Godfrey P. Ponciano
Niel Godfrey P. Ponciano

Reputation: 10709

If not_segments and only_segments are records from the same model, you can just combine them with the OR (|) operator producing another queryset. The result will be unique items.

deduplicated_qs = not_segments | only_segments

If they are records from different models, then you can manually filter out duplicate values by tracking the already-added guids to not re-add them again.

import itertools

# To simplify the example, this is just a raw Python-class. In reality, this would be the Django-model-class.
class Page:
    def __init__(self, guid, value):
        self.guid = guid
        self.value = value

class Segment:
    def __init__(self, guid, other_value):
        self.guid = guid
        self.other_value = other_value

only_segments = [
    Page(2, 'A'),
    Page(4, 'B'),
]
not_segments = [
    Segment(1, 'C'),
    Segment(2, 'D'),
    Segment(3, 'E'),
    Segment(4, 'F'),
]

added_guids = set()
deduplicated_pages = list()

for page_or_segment in itertools.chain(only_segments, not_segments):
    if page_or_segment.guid in added_guids:
        continue

    added_guids.add(page_or_segment.guid)
    deduplicated_pages.append(page_or_segment)

for page in deduplicated_pages:
    print(type(page), page.__dict__)

Output

<class '__main__.Page'> {'guid': 2, 'value': 'A'}
<class '__main__.Page'> {'guid': 4, 'value': 'B'}
<class '__main__.Segment'> {'guid': 1, 'other_value': 'C'}
<class '__main__.Segment'> {'guid': 3, 'other_value': 'E'}

Upvotes: 2

Related Questions