Reputation: 39
productcode = ['apple','orange','melons'] # 1000+ more
pairs = []
count = 0
for xi,x in enumerate(productcode):
del productcode[xi]
for yi,y in enumerate(productcode):
pc2 += 1
p = (x,y)
pairs.append(p)
print ("Number of distinct pairs:",pc2)
productcode contains over a thousand data items:
apple
orange
grape
Expected output:
apple orange
apple grape
orange grape
The nested for loops only iterate over half the items in the list (productcode) and therefore I end up with much lesser number of pairs than I expect. Could anyone help explain what I've done wrong, or what is actually happening?
Upvotes: 2
Views: 3511
Reputation: 164843
itertools.combinations
is a natural choice for this. To avoid duplicates, just convert your list
to a set
first. There are 2 similar solutions, depending on whether you need ordered results.
Ordered
from itertools import combinations
productcode = ['apple', 'orange', 'grape']
res_lst = sorted(map(sorted, combinations(set(productcode), 2)))
# [['apple', 'grape'], ['apple', 'orange'], ['grape', 'orange']]
I'm not sure what order you require, so I've sorted both within and across sublists, alphabetically in each case.
Unordered
If order is unimportant anywhere, then you need to use a set
of frozenset
items:
res_set = set(map(frozenset, combinations(set(productcode), 2)))
# {frozenset({'apple', 'orange'}),
# frozenset({'grape', 'orange'}),
# frozenset({'apple', 'grape'})}
This is because set
items must be immutable; frozenset
is an immutable version of set
. This is one natural way to test whether a pair is in the set. For example:
{'orange', 'apple'} in res_set # True
Another way is to use a set of alphabetically sorted tuples.
Upvotes: 5
Reputation: 51683
You modify a collection while iterating it. Bad idea.
There is this cool datastructure that gets rid of duplicates:
Create plenty of duplicate data:
from itertools import combinations
# make all 2-length combinations of 1,2,3,1,2,3,4,5,3
comb = list(combinations([ 1,2,3,1,2,3,4,5,3 ],2)) # works with strings as well
print(comb)
Output:
[(1, 2), (1, 3), (1, 1), (1, 2), (1, 3), (1, 4), (1, 5),
(1, 3), (2, 3), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5),
(2, 3), (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 3),
(1, 2), (1, 3), (1, 4), (1, 5), (1, 3), (2, 3), (2, 4),
(2, 5), (2, 3), (3, 4), (3, 5), (3, 3), (4, 5), (4, 3), (5, 3)]
Make data unique:
uniques = set(comb)
print(uniques)
Output:
set([(1, 2), (3, 2), (1, 3), (3, 3), (4, 5), (3, 1), (1, 4),
(2, 4), (1, 5), (2, 3), (2, 1), (4, 3), (2, 2), (2, 5),
(5, 3), (3, 4), (1, 1), (3, 5)])
If you need all combinations of something, stuff the somethings into a set
beforehand to eleminate all dupes and create your combinations
via itertools.combinations
from the set
. If you use combinations
on a list
with dupes, you create unneeded many combinations
- so set
first, then combination
s from it.
Drawback/catch for sets (and dicts) is that they need immutable keys - so tuples
are fine, lists
are not but strings
work well. You can tuple(alist)
if you need to.
Upvotes: 2