NI6
NI6

Reputation: 2887

Removing duplicate elements by their attributes in python

Is there a nice way to remove elements from a list, by their attributes?

Example:

lis = [['element1', 12], ['element2', 2], ['element3', 12], ['element4', 36], ['element5', 12]]

And I want to get this list:

new_lis = [['element1', 12], ['element2', 2], ['element4', 36]]

I am looking for a short and elegant solution, maybe a module I am not familiar with?

Upvotes: 4

Views: 3155

Answers (3)

Chadys
Chadys

Reputation: 196

My proposal for a one-liner

{key(elt): elt for elt in reversed(iterable)}.values()

The order of the iterable is not kept because of the reversed call, but without it the later duplicate elements would override the earliest ones. Might need to be adjusted depending on your constraints. Can be used like so, with the example given in the question:

from typing import Iterable, Callable, TypeVar
from operator import itemgetter

T = TypeVar("T")

def get_unique_elements(iterable: Iterable[T], key: Callable[[T], any]) -> Iterable[T]:
    """
    Returns all unique elements from an iterable,
    using the key function to establish unicity.
    Elements appearing first will have priority in case of duplicates
    """
    return {key(elt): elt for elt in reversed(iterable)}.values()

list(get_unique_elements(
    [
        ["element1", 12],
        ["element2", 2],
        ["element3", 12],
        ["element4", 36],
        ["element5", 12],
    ],
    key=itemgetter(1),
)

Out: [['element1', 12], ['element4', 36], ['element2', 2]]

Upvotes: 0

Sede
Sede

Reputation: 61225

The best way to do this is using simple generator function. The reason is that generator is lazy evaluated which means that it produces the item in the list on demand; saves a lot of memory for large list. You can then iterate the generator object and do something with the item

Demo:

>>> lis = [['element1', 12], ['element2', 2], ['element3', 12], ['element4', 36], ['element5', 12]]
>>> def deduplicate(items):
...     seen = set()
...     for item in items:
...         if not item[1] in seen:
...             seen.add(item[1])
...             yield item
... 
>>> deduplicate(lis)
<generator object deduplicate at 0x7fd454352e08>
>>> for item in deduplicate(lis):
...     print(item)
... 
['element1', 12]
['element2', 2]
['element4', 36]
>>> list(deduplicate(lis))
[['element1', 12], ['element2', 2], ['element4', 36]]

Upvotes: 4

timgeb
timgeb

Reputation: 78690

Write a function for this:

def remove_duplicates_n(lis, n):
    'returns new list with items from lis and duplicates at position n removed, keeps order'
    seen = set()
    result = []
    for item in lis:
        if item[n] not in seen:
            result.append(item)
            seen.add(item[n])
    return result

For your desired result, call remove_duplicates_n(lis, 1).

Bonus: if you want to go to the dark side of sideeffects...

>>> seen = set()
>>> [x for x in lis if x[1] not in seen and not seen.add(x[1])]
[['element1', 12], ['element2', 2], ['element4', 36]]

Upvotes: 2

Related Questions