Reputation: 1229
For example, let's say I'm given this list of dictionaries:
test1 = {'Count':34, 'Letter':'a', 'Word':'all'}
test2 = {'Count':890, 'Letter':'a', 'Word':'all'}
test3 = {'Count':333, 'Letter':'b', 'Word':'joy'}
test4 = {'Count':2, 'Letter':'a', 'Word':'all'}
test_list = [test1, test2, test3, test4]
Ideally, I want to remove all of the dictionaries from the list which have 'Letter':'a'
and 'Word':'all'
with the exception of one, where the one I keep has the largest value of 'Count'
. In this case, I would want the list to be reduced to having only [test2, test3]
. Is there a simple way to do this?
I've only been able to find resources which can remove duplicates if the entire dictionary is the same, but I haven't found anything for when only a small number of values are the same. Any help is appreciated.
Upvotes: 0
Views: 47
Reputation: 71451
You can also try this one-liner:
test1 = {'Count':34, 'Letter':'a', 'Word':'all'}
test2 = {'Count':890, 'Letter':'a', 'Word':'all'}
test3 = {'Count':333, 'Letter':'b', 'Word':'joy'}
test4 = {'Count':2, 'Letter':'a', 'Word':'all'}
test_list = [test1, test2, test3, test4]
final_list = [i for i in test_list if (i['Word'] != 'all' and i['Letter'] != 'a') or i['Count'] == max([b['Count'] for b in test_list])]
Output:
[{'Count': 890, 'Word': 'all', 'Letter': 'a'}, {'Count': 333, 'Word': 'joy', 'Letter': 'b'}]
Upvotes: 0
Reputation: 1121594
You'd want to group your dictionaries first, then keep only the dictionary with the highest value for 'Count'
in each group. You can use a set to track which groups you have already seen to filter out subsequence dictionaries that fall in the same:
grouped = {}
for d in test_list:
group_key = d['Letter'], d['Word']
grouped.setdefault(group_key, []).append(d)
test_list = [max(dlist, key=lambda d: d['Count']) for dlist in grouped.values()]
This lets you filter the dictionaries in linear time (O(n)).
Note that the output order is not necessarily the input order for Python versions < 3.6; replace grouped = {}
with from collections import OrderedDict
and grouped = OrderedDict()
if order matters.
Demo:
>>> test1 = {'Count':34, 'Letter':'a', 'Word':'all'}
>>> test2 = {'Count':890, 'Letter':'a', 'Word':'all'}
>>> test3 = {'Count':333, 'Letter':'b', 'Word':'joy'}
>>> test4 = {'Count':2, 'Letter':'a', 'Word':'all'}
>>> test_list = [test1, test2, test3, test4]
>>> grouped = {}
>>> for d in test_list:
... group_key = d['Letter'], d['Word']
... grouped.setdefault(group_key, []).append(d)
...
>>> [max(dlist, key=lambda d: d['Count']) for dlist in grouped.values()]
[{'Count': 890, 'Letter': 'a', 'Word': 'all'}, {'Count': 333, 'Letter': 'b', 'Word': 'joy'}]
Upvotes: 1