Felix Emanuel
Felix Emanuel

Reputation: 123

In Python, efficiently filter subsequences of possible combinations

Is there a way to efficiently filter items from the list of subsequences of a list?

For a minimal example, consider the list l = [-2, -1, 1, 2] and the subsequences in itertools.combinations(l, r=2). Suppose I'd like to filter the subsequences of this list so that for every number i represented in the list, at most one of (-i, i) should be in any subsequence. In this example, the desired output is [(-2, -1), (-2, 1), (-1, 2), (1, 2)].

Here's a naive way of filtering:

from itertools import combinations

def filtered( iterable, r ):
    for i in combinations( iterable , r ):
        for x in i:
            if x*-1 in i: # condition line
                break
        else:
            yield i

But with increasing input size, this rejection approach wastes a lot of data. For example, filtered( list(range(-20,20), 5 ) will reject about 35% of the original subsequences. It's also much slower (about six times) than combinations().

Ideally, I'd like to keep a configurable condition so that filtering on non-numeric data remains possible.

Upvotes: 1

Views: 196

Answers (1)

fooiey
fooiey

Reputation: 1590

I think that AKX is somewhat right. You definitely need to iterate through all the possible combinations. You are doing this, as that outer for loop is O(n). However, the checking for duplicates in each combination is not optimized.

The inner for loop

for x in i:
    if x*-1 in i: # condition line
        break

Is not optimized. This is O(n^2) since the in operator iterates through the list.

You can make this inner for loop O(n) by using a hashset for each tuple (set() or dict() in python).

nums = set()
for num in my_tuple:
    if num*-1 in nums:
        break
    nums.add(num)

The difference is that a hashset has a lookup of O(1), so the in operator is O(1) instead of O(n). We just sacrifice a little bit of space complexity.

The last thing you could do is re-implement the combination method yourself to include this type of checking, so that tuples that violate your condition aren't produced in the first place.

If you want to do this, here is some source code for the itertools.combinations() function, and you just have add some validation to it.

Upvotes: 1

Related Questions