Reputation: 123
Is there a way to efficiently filter items from the list of subsequences of a list?
For a minimal example, consider the list l = [-2, -1, 1, 2]
and the subsequences in itertools.combinations(l, r=2)
. Suppose I'd like to filter the subsequences of this list so that for every number i represented in the list, at most one of (-i, i) should be in any subsequence. In this example, the desired output is [(-2, -1), (-2, 1), (-1, 2), (1, 2)]
.
Here's a naive way of filtering:
from itertools import combinations
def filtered( iterable, r ):
for i in combinations( iterable , r ):
for x in i:
if x*-1 in i: # condition line
break
else:
yield i
But with increasing input size, this rejection approach wastes a lot of data. For example, filtered( list(range(-20,20), 5 )
will reject about 35% of the original subsequences. It's also much slower (about six times) than combinations()
.
Ideally, I'd like to keep a configurable condition so that filtering on non-numeric data remains possible.
Upvotes: 1
Views: 196
Reputation: 1590
I think that AKX is somewhat right. You definitely need to iterate through all the possible combinations. You are doing this, as that outer for loop is O(n). However, the checking for duplicates in each combination is not optimized.
The inner for loop
for x in i:
if x*-1 in i: # condition line
break
Is not optimized. This is O(n^2) since the in
operator iterates through the list.
You can make this inner for loop O(n) by using a hashset for each tuple (set() or dict() in python).
nums = set()
for num in my_tuple:
if num*-1 in nums:
break
nums.add(num)
The difference is that a hashset has a lookup of O(1), so the in
operator is O(1) instead of O(n). We just sacrifice a little bit of space complexity.
The last thing you could do is re-implement the combination method yourself to include this type of checking, so that tuples that violate your condition aren't produced in the first place.
If you want to do this, here is some source code for the itertools.combinations() function, and you just have add some validation to it.
Upvotes: 1