genclik27
genclik27

Reputation: 323

itertools.product eliminating repeated elements

How can I skip the tuples which has duplicate elements in the iteration when I use itertools.product? Or let's say, is there anyway not to look at them in the iteration? Because skipping may be time consuming if the number of lists are too much.

Example,
lis1 = [1,2]
lis2 = [2,4]
lis3 = [5,6]

[i for i in product(lis1,lis2,lis3)] should be [(1,2,5), (1,2,6), (1,4,5), (1,4,6), (2,4,5), (2,4,6)]

It will not have (2,2,5) and (2,2,6) since 2 is duplicate in here. How can I do that?

Upvotes: 5

Views: 8660

Answers (3)

Jeff Hernandez
Jeff Hernandez

Reputation: 2123

With itertools.combinations there will be no repeated elements in sorted order:

>>> lis = [1, 2, 4, 5, 6]
>>> list(itertools.combinations(lis, 3))
[(1, 2, 4), (1, 2, 5), (1, 2, 6), (1, 4, 5), (1, 4, 6), (1, 5, 6), (2, 4, 5), 
(2, 4, 6), (2, 5, 6), (4, 5, 6)]

Upvotes: 6

Tim Peters
Tim Peters

Reputation: 70602

itertools generally works on unique positions within inputs, not on unique values. So when you want to remove duplicate values, you generally have to either post-process the itertools result sequence, or "roll your own". Because post-processing can be very inefficient in this case, roll your own:

def uprod(*seqs):
    def inner(i):
        if i == n:
            yield tuple(result)
            return
        for elt in sets[i] - seen:
            seen.add(elt)
            result[i] = elt
            for t in inner(i+1):
                yield t
            seen.remove(elt)

    sets = [set(seq) for seq in seqs]
    n = len(sets)
    seen = set()
    result = [None] * n
    for t in inner(0):
        yield t

Then, e.g.,

>>> print list(uprod([1, 2, 1], [2, 4, 4], [5, 6, 5]))
[(1, 2, 5), (1, 2, 6), (1, 4, 5), (1, 4, 6), (2, 4, 5), (2, 4, 6)]
>>> print list(uprod([1], [1, 2], [1, 2, 4], [1, 5, 6]))
[(1, 2, 4, 5), (1, 2, 4, 6)]
>>> print list(uprod([1], [1, 2, 4], [1, 5, 6], [1]))
[]
>>> print list(uprod([1, 2], [3, 4]))
[(1, 3), (1, 4), (2, 3), (2, 4)]

This can be much more efficient, since a duplicate value is never even considered (neither within an input iterable, nor across them).

Upvotes: 12

thefourtheye
thefourtheye

Reputation: 239473

lis1 = [1,2]
lis2 = [2,4]
lis3 = [5,6]
from itertools import product
print [i for i in product(lis1,lis2,lis3) if len(set(i)) == 3]

Output

[(1, 2, 5), (1, 2, 6), (1, 4, 5), (1, 4, 6), (2, 4, 5), (2, 4, 6)]

Upvotes: 5

Related Questions