Anita
Anita

Reputation: 285

Combinations of unique products with itertools

I have a nested list and would like to make a product of two items.

test = [[('juice', 'NOUN'), ('orange', 'FLAVOR')], 
        [('juice', 'NOUN'), ('orange', 'FLAVOR'), ('lemon', 'FLAVOR')],
        [('orange', 'FLAVOR'), ('chip', 'NOUN')]]

What I expect is something like this:

[(('juice', 'NOUN'), ('lemon', 'FLAVOR')), 
 (('juice', 'NOUN'), ('chip', 'NOUN')),
 (('orange', 'FLAVOR'), ('lemon', 'FLAVOR')),
 (('orange', 'FLAVOR'), ('chip', 'NOUN')),
 (('lemon', 'FLAVOR'), ('chip', 'NOUN'))]

That is to say, I would like to get the permutation across lists but only for unique items. I prefer to use itertools. Previously, I tried list(itertools.product(*test)) But I realized it would produce the product of the length of a nested list...

My current code:

unique_list = list(set(itertools.chain(*test)))
list(itertools.combinations(unique_list, 2))

My thought process is to get the unique items in the nested list first, so the nested list will be [[('juice', 'NOUN'), ('orange', 'FLAVOR')], [('lemon', 'FLAVOR')], [('chip', 'NOUN')]] and then use the itertools.combinations to permute. Yet, it will permute within the list (i.e. juice and orange appear together), which I do not want in my results.

Upvotes: 1

Views: 483

Answers (1)

Ma0
Ma0

Reputation: 15204

This does what you want without fixing the size of the original list to 3:

Input:

test = [[('juice', 'NOUN'), ('orange', 'FLAVOR')], 
        [('juice', 'NOUN'), ('orange', 'FLAVOR'), ('lemon', 'FLAVOR')],
        [('juice', 'NOUN'), ('chip', 'NOUN')]]

First, reformat input to remove duplicates (see note 1):

test = [[x for x in sublist if x not in sum(test[:i], [])] for i, sublist in enumerate(test)]

Finally, get the product of the combinations.

from itertools import combinations, product

for c in combinations(test, 2):
    for x in product(*c):
        print(x)

which produces:

(('juice', 'NOUN'), ('lemon', 'FLAVOR'))
(('orange', 'FLAVOR'), ('lemon', 'FLAVOR'))
(('juice', 'NOUN'), ('chip', 'NOUN'))
(('orange', 'FLAVOR'), ('chip', 'NOUN'))
(('lemon', 'FLAVOR'), ('chip', 'NOUN'))

  1. removes inner tuples if they were seen in any of the previous sublists. The magic here is done by the sum(test[:i], []) which "adds" all the previous sublists together to perform one membership check only.

There is also a list-comprehension version of the above for compactness and style-points:

res = [x for c in combinations(test, 2) for x in product(*c)]

Upvotes: 1

Related Questions