user12373228
user12373228

Reputation:

How do I make a list of sets of n+1 elements out of a list of sets of n elements efficiently in Python?

Suppose I have a list of sets of two elements each. I want to build a list of sets of three elements out of this.

For example, my input list is [{1, 2}, {2, 3}, {1, 3}, {3, 4}, {4, 5}].

I want an output containing all the possible sets of three out of the flattened input list. That is,

[{1, 2, 3}, {1, 2, 4}, {1, 2, 5}, {1, 3, 5}, {1, 3, 4}, {1, 4, 5} {2, 3, 4}, {2, 3, 5}, {3, 4, 5}, {2, 4, 5}]

I ran the following code to try to do this using pandas and itertools library functions:

def build_sets(item_list, set_size):
    flattened_item_list = list(set(flatten(item_list)))
    combinations = itertools.combinations(range(len(flattened_item_list)), set_size)
    return_list = []
    for i in combinations:
        combination_element = []
        for j in i:
            combination_element.append(flattened_item_list[j])
        return_list.append(set(combination_element))
    return return_list

Here set_size refers to the size of each set in the desired output list. This code works well enough for smaller inputs, but if the input list contains thousands of sets, the code takes a very long time to run.

How can I try to do the same task but for input of the order of ten thousand elements quickly enough?

Thank you.

Upvotes: 2

Views: 44

Answers (1)

Thierry Lathuille
Thierry Lathuille

Reputation: 24232

You're overcomplicating it. Just generate combinations of your values directly, no need to go through the use of indices:

from itertools import combinations

data = [{1, 2}, {2, 3}, {1, 3}, {3, 4}, {4, 5}]
distinct = set.union(*data)

out = [set(c) for c in combinations(distinct, r=3)]

print(out)
# [{1, 2, 3}, {1, 2, 4}, {1, 2, 5}, {1, 3, 4}, {1, 3, 5}, {1, 4, 5}, {2, 3, 4}, {2, 3, 5}, {2, 4, 5}, {3, 4, 5}]

So, your function would simply be:

def build_sets(item_list, set_size):
    distinct = set.union(*item_list)
    return [set(c) for c in combinations(distinct, r=set_size)]

As the returned list may be rather large, you might want to use a generator that would yield the sets one by one instead :

def sets_generator(item_list, set_size):
    distinct = set.union(*item_list)
    for c in combinations(distinct, r=set_size):
        yield set(c)

And use it like this:

for s in sets_generator(data, 3):
    print(s)

Output:

{1, 2, 3}
{1, 2, 4}
{1, 2, 5}
{1, 3, 4}
{1, 3, 5}
{1, 4, 5}
{2, 3, 4}
{2, 3, 5}
{2, 4, 5}
{3, 4, 5}

Upvotes: 5

Related Questions