Ajit Panigrahi
Ajit Panigrahi

Reputation: 792

Remove duplicate unordered tuples from list

In a list of tuples, I want to have just one copy of a tuple where it may be (x, y) or (y, x).

So, in:

# pairs = list(itertools.product(range(3), range(3)))
pairs = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]

the result should be:

result = [(0, 0), (0, 1), (0, 2), (1, 1), (1, 2), (2, 2)] # updated pairs

This list of tuples is generated using itertools.product() but I want to remove the duplicates.

My working solution:

pairs = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]

result = []
for pair in pairs:
    a, b = pair
    # reordering in increasing order
    temp = (a, b) if a < b else (b, a)
    result.append(temp)
print(list(set(result))) # I could use sorted() but the order doesn't matter

How can this be improved?

Upvotes: 1

Views: 1319

Answers (3)

jpp
jpp

Reputation: 164673

This is one solution which relies on sparse matrices. This works for the following reasons:

An entry in a matrix cannot contain two values. Therefore, uniqueness is guaranteed.

Selecting the upper triangle ensures that (0, 1) is preferred above (1, 0), and inclusion of both is not possible.

import numpy as np
from scipy.sparse import csr_matrix, triu

lst = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1),
       (1, 2), (2, 0), (2, 1), (2, 2)]

# get row coords & col coords
d1, d2 = list(zip(*lst))

# set up sparse matrix inputs
row, col, data = np.array(d1), np.array(d2), np.array([1]*len(lst))

# get upper triangle of matrix including diagonal
m = triu(csr_matrix((data, (row, col))), 0)

# output coordinates
result = list(zip(*(m.row, m.col)))

# [(0, 0), (0, 1), (0, 2), (1, 1), (1, 2), (2, 2)]

Upvotes: 1

ryati
ryati

Reputation: 360

edit I just realized, your solution matches my solution. What you are doing is just fine. If you need to do this for a very large list, then there are some other options you may want to look into, like a key value store.

If you need to remove dupes more programatically, then you can use a function like this:

def set_reduce(pairs):
    new_pairs = set([])
    for x,y in pairs:
        if x < y:
            new_pairs.add((x,y))
        else:
            new_pairs.add((y,x))
    return new_pairs

running this results in

>>>set_reduce(pairs)
set([(0, 1), (1, 2), (0, 0), (0, 2), (2, 2), (1, 1)])

Upvotes: 1

Wondercricket
Wondercricket

Reputation: 7872

You could use combinations_with_replacement

The code for combinations_with_replacement() can be also expressed as a subsequence of product() after filtering entries where the elements are not in sorted order (according to their position in the input pool)

import itertools
pairs = list(itertools.combinations_with_replacement(range(3), 2))
print(pairs)

>>> [(0, 0), (0, 1), (0, 2), (1, 1), (1, 2), (2, 2)]

Upvotes: 4

Related Questions