How to remove dictionaries in a list which have some duplicate values (but not all)

Question

For example, let's say I'm given this list of dictionaries:

test1 = {'Count':34, 'Letter':'a', 'Word':'all'}
test2 = {'Count':890, 'Letter':'a', 'Word':'all'}
test3 = {'Count':333, 'Letter':'b', 'Word':'joy'}
test4 = {'Count':2, 'Letter':'a', 'Word':'all'}

test_list = [test1, test2, test3, test4]

Ideally, I want to remove all of the dictionaries from the list which have 'Letter':'a' and 'Word':'all' with the exception of one, where the one I keep has the largest value of 'Count'. In this case, I would want the list to be reduced to having only [test2, test3]. Is there a simple way to do this?

I've only been able to find resources which can remove duplicates if the entire dictionary is the same, but I haven't found anything for when only a small number of values are the same. Any help is appreciated.

Martijn Pieters · Accepted Answer

You'd want to group your dictionaries first, then keep only the dictionary with the highest value for 'Count' in each group. You can use a set to track which groups you have already seen to filter out subsequence dictionaries that fall in the same:

grouped = {}
for d in test_list:
    group_key = d['Letter'], d['Word']
    grouped.setdefault(group_key, []).append(d)

test_list = [max(dlist, key=lambda d: d['Count']) for dlist in grouped.values()]

This lets you filter the dictionaries in linear time (O(n)).

Note that the output order is not necessarily the input order for Python versions < 3.6; replace grouped = {} with from collections import OrderedDict and grouped = OrderedDict() if order matters.

Demo:

>>> test1 = {'Count':34, 'Letter':'a', 'Word':'all'}
>>> test2 = {'Count':890, 'Letter':'a', 'Word':'all'}
>>> test3 = {'Count':333, 'Letter':'b', 'Word':'joy'}
>>> test4 = {'Count':2, 'Letter':'a', 'Word':'all'}
>>> test_list = [test1, test2, test3, test4]
>>> grouped = {}
>>> for d in test_list:
...     group_key = d['Letter'], d['Word']
...     grouped.setdefault(group_key, []).append(d)
...
>>> [max(dlist, key=lambda d: d['Count']) for dlist in grouped.values()]
[{'Count': 890, 'Letter': 'a', 'Word': 'all'}, {'Count': 333, 'Letter': 'b', 'Word': 'joy'}]

How to remove dictionaries in a list which have some duplicate values (but not all)

Answers (2)

Related Questions