Robbie Milejczak
Robbie Milejczak

Reputation: 5780

Python dynamically compute a list with no duplicates using only list comprehensions

This is sort of a ridiculous and weird use case but bear with me, I have this list comprehension:

"reading_types": [
    {
        "name": rt.reading_type,
        "value": rt.reading_type_id,
    }
    for unit in item.units
    for rt in unit.reading_types
],

in a backend api call. It works great except that there will almost always be duplicates in the end result. How can I ensure that no duplicates are returned?

This is actually happening inside another list comprehension, and I can't reference the list at any point to remove duplicates so I must do so within the list comprehension itself.

I've tried using a set:

set([
    {
        "name": rt.reading_type,
        "value": rt.reading_type_id,
    }
    for unit in item.units
    for rt in unit.reading_types
])

but this results in the error: unhashable type: dict

Upvotes: 3

Views: 134

Answers (4)

Jean-François Fabre
Jean-François Fabre

Reputation: 140307

the idea is to make your structures hashable without destroying them too much so you can restore them back as how they were.

You could convert your dictionaries to dict_items then to tuples (now we can put that in a set because data is hashable), apply a set on that, and convert back to dictionary:

input_list = [{"name":"name1","id":"id1"},{"name":"name2","id":"id2"},
{"name":"name1","id":"id1"}]

output_list = [dict(items) for items in {tuple(a.items()) for a in input_list}]

This works because values of the sub-dicts are hashable (strings). If they were dictionaries, we'd have to convert them too.

result:

[{'id': 'id1', 'name': 'name1'}, {'id': 'id2', 'name': 'name2'}]

another solution (by Jon Clements) that doesn't use a set but builds a dictionary (using a dictionary comprehension) & uses key unicity to clobber duplicates, then extract only values:

list({tuple(d.items()):d for d in input_list}.values())

Upvotes: 6

Patrick Haugh
Patrick Haugh

Reputation: 61063

You can use a namedtuple instead of a dictionary inside the set. As immutable objects, namedtuples are hashable, which dictionaries are not. You can also use a set comprehension directly:

from collections import namedtuple

reading_type = namedtuple("reading_type", ["name", "value"])

{reading_type(rt.reading_type, rt.reading_type_id) 
    for unit in item.units
    for rt in unit.reading_types}

Upvotes: 2

jpp
jpp

Reputation: 164843

This isn't a list comprehension, but you can use the itertools unique_everseen recipe, also available in 3rd party libraries, e.g. more_itertools.unique_everseen:

from more_itertools import unique_everseen

input_list = [{"name":"name1","id":"id1"},{"name":"name2","id":"id2"},
              {"name":"name1","id":"id1"}]

res = list(unique_everseen(input_list, key=lambda d: tuple(sorted(d.items()))))

print(res)

[{'name': 'name1', 'id': 'id1'}, {'name': 'name2', 'id': 'id2'}]

The trick is to make sure you can hash your dictionaries, which we perform by converting each dictionary to a tuple of sorted tuples. Internally, the algorithm works by maintaining a "seen" set of values and yielding only values which do not appear in the set, adding them otherwise.

Upvotes: 0

Tim
Tim

Reputation: 2843

You can wrap your entire list in another comprehension to repr each entry, and use set on that:

set([repr(val) for val in [...]])

Upvotes: -1

Related Questions