David Schumann
David Schumann

Reputation: 14793

Most pythonic way to remove tuples from a list if first element is a duplicate

The code I have so far is pretty ugly:

orig = [(1,2),(1,3),(2,3),(3,3)]
previous_elem = []
unique_tuples = []
for tuple in orig:
    if tuple[0] not in previous_elem:
        unique_tuples += [tuple]
    previous_elem += [tuple[0]]
assert unique_tuples == [(1,2),(2,3),(3,3)]

There must be a more pythonic solution.

Upvotes: 2

Views: 2534

Answers (3)

miradulo
miradulo

Reputation: 29710

If you don't care which tuple round you return for duplicates, you could always convert your list to a dictionary and back:

>>> orig = [(1,2),(1,3),(2,3),(3,3)]
>>> list(dict(orig).items())
[(1, 3), (2, 3), (3, 3)]

If you want to return the first tuple round, you could reverse your list twice and use an OrderedDict, like this:

>>> from collections import OrderedDict
>>> orig = [(1,2),(1,3),(2,3),(3,3)]
>>> new = list(OrderedDict(orig[::-1]).items())[::-1]
[(1, 2), (2, 3), (3, 3)]

These are not the most efficient solutions (if that's of great importance) , but they do make for nice idiomatic one-liners.


Some benchmarking

Note the difference in speed, and that should you not care which tuple round you return, the first option is much more efficient:

>>> import timeit
>>> setup = '''
orig = [(1,2),(1,3),(2,3),(3,3)]
'''
>>> print (min(timeit.Timer('(list(dict(orig).items()))', setup=setup).repeat(7, 1000)))
0.0015771419037069459

compared to

>>>setup = '''
orig = [(1,2),(1,3),(2,3),(3,3)]
from collections import OrderedDict
'''
>>> print (min(timeit.Timer('(list(OrderedDict(orig[::-1]).items())[::-1])', 
             setup=setup).repeat(7, 1000)))
0.024554947372323

The first option is nearly 15 times faster according to these speed tests.

That being said however, Saksham's answer is also O(n) and smashes these dictionary methods efficiency wise:

>>> setup = '''
orig = [(1,2),(1,3),(2,3),(3,3)]
newlist = []
seen = set()
def fun():
    for (a, b) in orig:
        if not a in seen:
            newlist.append((a, b))
            seen.add(a)
    return newlist
'''
>>> print (min(timeit.Timer('fun()', setup=setup).repeat(7, 1000)))
0.0004833390384996095

Upvotes: 8

Saksham Varma
Saksham Varma

Reputation: 2140

If you do not wish to store in an additional data structure, the time complexity O(n^2), as pointed out in the comments:

orig = [(1,2),(1,3),(2,3),(3,3)]
newlist = []
for (a, b) in orig:
    if not any(x == a for x, y in newlist):
        newlist.append((a, b))
print newlist    # prints [(1, 2), (2, 3), (3, 3)]

A little book-keeping can reduce it to linear time:

orig = [(1,2),(1,3),(2,3),(3,3)]
newlist = []
seen = set()
for (a, b) in orig:
    if not a in seen:
        newlist.append((a, b))
        seen.add(a)
print newlist    # prints [(1, 2), (2, 3), (3, 3)]

Upvotes: 1

Scis
Scis

Reputation: 2984

If you want the first containing a specific key to always be the one that appears in the final list:

list(reversed(collections.OrderedDict( reversed([(1,2),(1,3),(2,3),(3,3)])).items()))

Which results in:

 [(1, 2), (2, 3), (3, 3)]

Upvotes: 2

Related Questions