JubG
JubG

Reputation: 39

building distinct pairs in python

productcode = ['apple','orange','melons'] # 1000+ more
pairs = []
count = 0
for xi,x in enumerate(productcode):
    del productcode[xi]
    for yi,y in enumerate(productcode):
        pc2 += 1
        p = (x,y)
        pairs.append(p)

print ("Number of distinct pairs:",pc2)

productcode contains over a thousand data items:

apple

orange

grape

Expected output:

apple orange

apple grape

orange grape

The nested for loops only iterate over half the items in the list (productcode) and therefore I end up with much lesser number of pairs than I expect. Could anyone help explain what I've done wrong, or what is actually happening?

Upvotes: 2

Views: 3511

Answers (2)

jpp
jpp

Reputation: 164843

itertools.combinations is a natural choice for this. To avoid duplicates, just convert your list to a set first. There are 2 similar solutions, depending on whether you need ordered results.

Ordered

from itertools import combinations

productcode = ['apple', 'orange', 'grape']

res_lst = sorted(map(sorted, combinations(set(productcode), 2)))

# [['apple', 'grape'], ['apple', 'orange'], ['grape', 'orange']]

I'm not sure what order you require, so I've sorted both within and across sublists, alphabetically in each case.

Unordered

If order is unimportant anywhere, then you need to use a set of frozenset items:

res_set = set(map(frozenset, combinations(set(productcode), 2)))

# {frozenset({'apple', 'orange'}),
#  frozenset({'grape', 'orange'}),
#  frozenset({'apple', 'grape'})}

This is because set items must be immutable; frozenset is an immutable version of set. This is one natural way to test whether a pair is in the set. For example:

{'orange', 'apple'} in res_set  # True

Another way is to use a set of alphabetically sorted tuples.

Upvotes: 5

Patrick Artner
Patrick Artner

Reputation: 51683

You modify a collection while iterating it. Bad idea.

There is this cool datastructure that gets rid of duplicates:

Create plenty of duplicate data:

from itertools import combinations

# make all 2-length combinations of 1,2,3,1,2,3,4,5,3   
comb = list(combinations([ 1,2,3,1,2,3,4,5,3  ],2)) # works with strings as well
print(comb) 

Output:

[(1, 2), (1, 3), (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), 
 (1, 3), (2, 3), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), 
 (2, 3), (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 3), 
 (1, 2), (1, 3), (1, 4), (1, 5), (1, 3), (2, 3), (2, 4), 
 (2, 5), (2, 3), (3, 4), (3, 5), (3, 3), (4, 5), (4, 3), (5, 3)]

Make data unique:

uniques = set(comb)
print(uniques)  

Output:

set([(1, 2), (3, 2), (1, 3), (3, 3), (4, 5), (3, 1), (1, 4), 
     (2, 4), (1, 5), (2, 3), (2, 1), (4, 3), (2, 2), (2, 5), 
     (5, 3), (3, 4), (1, 1), (3, 5)])

If you need all combinations of something, stuff the somethings into a set beforehand to eleminate all dupes and create your combinations via itertools.combinations from the set. If you use combinations on a list with dupes, you create unneeded many combinations - so set first, then combinations from it.


Drawback/catch for sets (and dicts) is that they need immutable keys - so tuples are fine, lists are not but strings work well. You can tuple(alist) if you need to.

Upvotes: 2

Related Questions