Reputation: 71
I have a large dict: 600 keys with items = big list (about 10000-20000 elements).
My goal is to get pairs from each list in dict and merge it in one list.
E.g. i have:
d1 = {'key1': ['a', 'b', 'c', 'd'], 'key2': ['f', 'a']}
Expected result:
d2 = ['a_b', 'a_c', 'a_d', 'b_c', 'b_d', 'c_d', 'a_f']
My code:
d2 = []
for k, v in d1.items():
for i, j in itertools.product(v, v):
if i>j:
a = "_".join(list(set([i, j])))
d2.append(a)
And I have a problem: in terminal my python script says 'Killed'.
This is probably due to inappropriate memory usage. Are there ways to solve this problem?
Upvotes: 0
Views: 228
Reputation: 1645
You could create a generator that doesn't involve itertools
:
def dic_comb_generator(d):
for val in d.values():
v = sorted(val)
for i in range(len(v)):
for j in range(i+1, len(v)):
yield v[i] + '_' + v[j]
Upvotes: 1
Reputation: 5660
You could do something like this:
import itertools as it
for l in d1.values():
for t in it.combinations(sorted(l), 2):
print("_".join(t))
Displaying:
a_b
a_c
a_d
b_c
b_d
c_d
a_f
Note: If you do not want it to be sorted, simply take out the sorted
function call.
Upvotes: 3
Reputation: 476813
What you describe is not a product
, but combinations
.
Furthermore if memory is an issue, you better use a generator so:
from itertools import combinations
def dic_comb_generator(d1):
for v in d1.values():
for t in combinations(sorted(v),2):
yield "%s_%s"%t
Here we use sorted(..)
to first sort the elements in v
such that the resulting tuples are sorted as well. In case you do not want the combinations to be sorted, but occur in the order of the list, you should remove the sorted(..)
function. Furthermore we use 2
since we construct combinations (tuples) with two elements.
If we materialize the output, we get:
>>> list(dic_comb_generator({'key1': ['a', 'b', 'c', 'd'], 'key2': ['f', 'a']}))
['a_b', 'a_c', 'a_d', 'b_c', 'b_d', 'c_d', 'a_f']
But if you use the generator in a for
loop, like:
for elem in dic_comb_generator(d1):
print(elem)
Python will not construct a list with all elements: all elements will be generated, but if you do not store them, then the memory used for the first item emitted, can be reused for the second item. Especially in the context of products, combinations,... where the number of elements can be huge, this can pay off: storing a list of 100M+ results in a huge memory burden, whereas handling one element at the time has constant memory usage.
Upvotes: 6