Jack
Jack

Reputation: 161

Find unique tuples in a list(ignore order) while preserving the others' original order in python?

Say I have a list looks something like this:

a = [(1,2),(3,1),(2,1),(4,5),(9,3),(1,3)]

And afterwards, I want to have something looks like this:

b = [(1,2),(3,1),(4,5),(9,3)]

Many Thanks!

Upvotes: 2

Views: 943

Answers (4)

Eugene Yarmash
Eugene Yarmash

Reputation: 149973

In Python 3.7+ (i.e., a version where dictionaries maintain insertion order) you can simply convert your list to a dictionary keyed by frozensets of the tuples and take its values:

a = [(1,2), (3,1), (2,1), (4,5), (9,3), (1,3)]
d = {}

for x in a:
    d.setdefault(frozenset(x), x)

print(list(d.values()))  # [(1, 2), (3, 1), (4, 5), (9, 3)]

Upvotes: 1

jamylak
jamylak

Reputation: 133634

Modification of @Pavel 's solution which uses frozenset making it more efficient because it avoids the sorting and generally faster in raw speed. This should be O(nm) compared to @Pavel's which should be O(n * m log(m)) where n is the length of the list and m is the the length of each tuple.

>>> a = [(1,2),(3,1),(2,1),(4,5),(9,3),(1,3)]
>>> b = []
>>> seen = set()
>>> for t in a:
        s = frozenset(t)
        if s not in seen:
            seen.add(s)
            b.append(t)


>>> b
[(1, 2), (3, 1), (4, 5), (9, 3)]

And here's proof of the difference:

from timeit import timeit


def dosorted(a):
    b = []
    seen = set()
    for t in a:
        s = tuple(sorted(t))
        if s not in seen:
            seen.add(s)
            b.append(t)
    return b

def dofrozenset(a):
    b = []
    seen = set()
    for t in a:
        s = frozenset(t)
        if s not in seen:
            seen.add(s)
            b.append(t)
    return b

import random
a = [(1,2),(3,1),(2,1),(4,5),(9,3),(1,3)]
b = [tuple(random.randrange(3) for x in range(10)) for x in range(10)]
c = [tuple(random.randrange(3) for x in range(20)) for x in range(20)]

setup = '''
from __main__ import a, b, c, dosorted, dofrozenset'''

print timeit(setup=setup, stmt='dosorted(a)')
print timeit(setup=setup, stmt='dosorted(b)')
print timeit(setup=setup, stmt='dosorted(c)')
print timeit(setup=setup, stmt='dofrozenset(a)')
print timeit(setup=setup, stmt='dofrozenset(b)')
print timeit(setup=setup, stmt='dofrozenset(c)')

9.23814695723 # dosorted(a)
26.8939069072 # dosorted(b)
86.6305864991 # dosorted(c)

5.99305211975 # dofrozenset(a)
10.708619182  # dofrozenset(b)
25.5252673175 # dofrozenset(c)

You could add some more tweaks to make these even faster such as the use of a list comprehension but that gets ugly quick. Another commonly seen technique that can be used in conjunction with the last one is this:

seen_add, b_append = seen.add, b.append # speeds up name lookup

These can then be called directly from then on, but remember that premature optimization is evil.

Upvotes: 3

Greg
Greg

Reputation: 5588

Just take your list, make it into a set, then turn it back into a list

>>>a = [(1, 2), (3, 1), (2, 1), (4, 5), (9, 3), (1, 3)]
>>>sorted_tuples = [tuple(sorted(tuple_)) for tuple_ in a]
>>>list(set(sorted_tuples))
[(1, 2), (4, 5), (3, 9), (1, 3)]

Upvotes: 1

Pavel Anossov
Pavel Anossov

Reputation: 62928

b = []
seen = set()
for t in a:
    s = tuple(sorted(t))
    if s not in seen:
        seen.add(s)
        b.append(t)

or

seen = set()
b = [t for t in a if tuple(sorted(t)) not in seen and not seen.add(tuple(sorted(t)))]

Upvotes: 4

Related Questions