user7179686
user7179686

Reputation:

Sorting a list of dictionaries by all keys being unique

Pulling my hair out with this one.

I have a list of dictionaries without a unique primary ID key for each unique entry (the dictionary is built on the fly):

dicts = [{'firstname': 'john', 'lastname': 'doe', 'code': 'crumpets'},
         {'firstname': 'john', 'lastname': 'roe', 'code': 'roe'},
         {'firstname': 'john', 'lastname': 'doe', 'code': 'crumpets'},
         {'firstname': 'thom', 'lastname': 'doe', 'code': 'crumpets'},
]

How do I go about filtering out lists of dictionaries like this where any repeating {} within the list are removed? So I need to check if all three of the dictionary keys match up with another in the list...and then discard that from the dict if that check is met.

So, for my example above, the first and third "entries" need to be removed as they are duplicates.

Upvotes: 2

Views: 76

Answers (3)

Aristide
Aristide

Reputation: 3994

Remove duplicates in a list of non-hashable elements requires you to make them hashable on the fly:

def remove_duplicated_dicts(elements):
    seen = set()
    result = []
    for element in elements:
        element_as_tuple = tuple(element.items())
        if element_as_tuple not in seen:
            seen.add(element_as_tuple)
            result.append(element)
    return result

d = [{'firstname': 'john', 'lastname': 'doe', 'code': "crumpets"},
        {'firstname': 'john', 'lastname': 'roe', 'code': "roe"},
        {'firstname': 'john', 'lastname': 'doe', 'code': "crumpets"},
        {'firstname': 'thom', 'lastname': 'doe', 'code': "crumpets"},
]

print(remove_duplicated_dicts(d))

PS.

Non-obvious differences with the accepted answer of Moses Koledoye (as of 2017-06-19 at 13:00:00):

  • preservation of the original list order;
  • faster conversions: dict -> tuple instead of dict -> frozendict -> dict (take it with a grain of salt: I have made no benchmark).

Upvotes: 2

Moses Koledoye
Moses Koledoye

Reputation: 78554

You use create frozensets from the dicts and put those in a set to remove dupes:

dcts = [dict(d) for d in set(frozenset(d.items()) for d in dcts)]
print(dcts)

[{'code': 'roe', 'firstname': 'john', 'lastname': 'roe'},
 {'code': 'crumpets', 'firstname': 'thom', 'lastname': 'doe'},
 {'code': 'crumpets', 'firstname': 'john', 'lastname': 'doe'}]

If you choose to remove all entries of the duplicates you can use a counter:

from collections import Counter

dcts = [dict(d) for d, cnt in Counter(frozenset(d.items()) for d in dcts).items() 
                                                                      if cnt==1]
print(dcts)

[{'code': 'roe', 'firstname': 'john', 'lastname': 'roe'},
 {'code': 'crumpets', 'firstname': 'thom', 'lastname': 'doe'}]

Upvotes: 5

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476709

Given the values of the dictionary are hashable, we can generate our own uniqness filter:

def uniq(iterable, key = lambda x:x):
    keys = set()
    for item in iterable:
        ky = key(item)
        if ky not in keys:
            yield item
            keys.add(ky)

We can then simply use the filter, like:

list(uniq(dicts,key=lambda x:(x['firstname'],x['lastname'],x['code'])))

The filter maintains the original order, and will - for this example - generate:

>>> list(uniq(dicts,key=lambda x:(x['firstname'],x['lastname'],x['code'])))
[{'code': 'crumpets', 'firstname': 'john', 'lastname': 'doe'},
 {'code': 'roe', 'firstname': 'john', 'lastname': 'roe'},
 {'code': 'crumpets', 'firstname': 'thom', 'lastname': 'doe'}]

Upvotes: 1

Related Questions