Reputation:
Pulling my hair out with this one.
I have a list of dictionaries without a unique primary ID key for each unique entry (the dictionary is built on the fly):
dicts = [{'firstname': 'john', 'lastname': 'doe', 'code': 'crumpets'},
{'firstname': 'john', 'lastname': 'roe', 'code': 'roe'},
{'firstname': 'john', 'lastname': 'doe', 'code': 'crumpets'},
{'firstname': 'thom', 'lastname': 'doe', 'code': 'crumpets'},
]
How do I go about filtering out lists of dictionaries like this where any repeating {} within the list are removed? So I need to check if all three of the dictionary keys match up with another in the list...and then discard that from the dict
if that check is met.
So, for my example above, the first and third "entries" need to be removed as they are duplicates.
Upvotes: 2
Views: 76
Reputation: 3994
Remove duplicates in a list of non-hashable elements requires you to make them hashable on the fly:
def remove_duplicated_dicts(elements):
seen = set()
result = []
for element in elements:
element_as_tuple = tuple(element.items())
if element_as_tuple not in seen:
seen.add(element_as_tuple)
result.append(element)
return result
d = [{'firstname': 'john', 'lastname': 'doe', 'code': "crumpets"},
{'firstname': 'john', 'lastname': 'roe', 'code': "roe"},
{'firstname': 'john', 'lastname': 'doe', 'code': "crumpets"},
{'firstname': 'thom', 'lastname': 'doe', 'code': "crumpets"},
]
print(remove_duplicated_dicts(d))
PS.
Non-obvious differences with the accepted answer of Moses Koledoye (as of 2017-06-19 at 13:00:00):
dict -> tuple
instead of dict -> frozendict -> dict
(take it with a grain of salt: I have made no benchmark).Upvotes: 2
Reputation: 78554
You use create frozensets from the dicts and put those in a set to remove dupes:
dcts = [dict(d) for d in set(frozenset(d.items()) for d in dcts)]
print(dcts)
[{'code': 'roe', 'firstname': 'john', 'lastname': 'roe'},
{'code': 'crumpets', 'firstname': 'thom', 'lastname': 'doe'},
{'code': 'crumpets', 'firstname': 'john', 'lastname': 'doe'}]
If you choose to remove all entries of the duplicates you can use a counter:
from collections import Counter
dcts = [dict(d) for d, cnt in Counter(frozenset(d.items()) for d in dcts).items()
if cnt==1]
print(dcts)
[{'code': 'roe', 'firstname': 'john', 'lastname': 'roe'},
{'code': 'crumpets', 'firstname': 'thom', 'lastname': 'doe'}]
Upvotes: 5
Reputation: 476709
Given the values of the dictionary are hashable, we can generate our own uniqness filter:
def uniq(iterable, key = lambda x:x):
keys = set()
for item in iterable:
ky = key(item)
if ky not in keys:
yield item
keys.add(ky)
We can then simply use the filter, like:
list(uniq(dicts,key=lambda x:(x['firstname'],x['lastname'],x['code'])))
The filter maintains the original order, and will - for this example - generate:
>>> list(uniq(dicts,key=lambda x:(x['firstname'],x['lastname'],x['code'])))
[{'code': 'crumpets', 'firstname': 'john', 'lastname': 'doe'},
{'code': 'roe', 'firstname': 'john', 'lastname': 'roe'},
{'code': 'crumpets', 'firstname': 'thom', 'lastname': 'doe'}]
Upvotes: 1