uzver
uzver

Reputation: 71

Get all possible pairs from multiple large lists

I have a large dict: 600 keys with items = big list (about 10000-20000 elements).

My goal is to get pairs from each list in dict and merge it in one list.

E.g. i have:

d1 = {'key1': ['a', 'b', 'c', 'd'], 'key2': ['f', 'a']}

Expected result:

d2 = ['a_b', 'a_c', 'a_d', 'b_c', 'b_d', 'c_d', 'a_f']

My code:

d2 = [] 
for k, v in d1.items():
    for i, j in itertools.product(v, v):
        if i>j:
            a = "_".join(list(set([i, j])))
            d2.append(a)

And I have a problem: in terminal my python script says 'Killed'.

This is probably due to inappropriate memory usage. Are there ways to solve this problem?

Upvotes: 0

Views: 228

Answers (3)

enedil
enedil

Reputation: 1645

You could create a generator that doesn't involve itertools:

def dic_comb_generator(d):
    for val in d.values():
        v = sorted(val)
        for i in range(len(v)):
            for j in range(i+1, len(v)):
                yield v[i] + '_' + v[j]

Upvotes: 1

rassar
rassar

Reputation: 5660

You could do something like this:

import itertools as it
for l in d1.values():
    for t in it.combinations(sorted(l), 2):
        print("_".join(t))

Displaying:

a_b
a_c
a_d
b_c
b_d
c_d
a_f

Note: If you do not want it to be sorted, simply take out the sorted function call.

Upvotes: 3

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476813

What you describe is not a product, but combinations.

Furthermore if memory is an issue, you better use a generator so:

from itertools import combinations

def dic_comb_generator(d1):
    for v in d1.values():
        for t in combinations(sorted(v),2):
            yield "%s_%s"%t

Here we use sorted(..) to first sort the elements in v such that the resulting tuples are sorted as well. In case you do not want the combinations to be sorted, but occur in the order of the list, you should remove the sorted(..) function. Furthermore we use 2 since we construct combinations (tuples) with two elements.

If we materialize the output, we get:

>>> list(dic_comb_generator({'key1': ['a', 'b', 'c', 'd'], 'key2': ['f', 'a']}))
['a_b', 'a_c', 'a_d', 'b_c', 'b_d', 'c_d', 'a_f']

But if you use the generator in a for loop, like:

for elem in dic_comb_generator(d1):
    print(elem)

Python will not construct a list with all elements: all elements will be generated, but if you do not store them, then the memory used for the first item emitted, can be reused for the second item. Especially in the context of products, combinations,... where the number of elements can be huge, this can pay off: storing a list of 100M+ results in a huge memory burden, whereas handling one element at the time has constant memory usage.

Upvotes: 6

Related Questions