Reputation: 7683
This is a problem involving subsequences. The ordinary subsequence problem has a list l1
and asks whether another list l2
is a subsequence. For example:
l1 = [3, 1, 4, 1, 5, 9]
l2 = [3, 1, 5] => True (remove 4, 1, 9)
l2 = [4, 3, 5] => False
In my problem, l2
instead has sets of values. For example:
l1 = [3, 1, 4, 1, 5, 9]
l2 = [{2, 3}, {1, 5, 2}, {9}]
=> True, because:
{2, 3} matches 3
{1, 5, 2} matches 1 (or 5)
{9} matches 9
Here, l2
can be thought of conceptually expanded to different possibilities by taking one element from each set:
l2exp = [[2, 1, 9], [2, 5, 9], [2, 2, 9], [3, 1, 9], [3, 5, 9], [3, 2, 9]]
which means as long as one of those six possible lists represented by l2
matches l1
, we have a successful match. Since [3, 1, 9]
matches l1
, the whole l2
matches.
So one solution may be first flatten the l2
into the new l2exp
like above, and then for each sublist_l2
in l2exp
, the ordinary subsequence check all(e in iter_of_l1 for e in sublist_l2)
can be used.
How to do this match without explicitly expanding the l2
into a list of lists?
Upvotes: 1
Views: 562
Reputation: 27629
Instead of checking equality of the l1
and l2
elements, check for membership of the l1
elements in the l2
elements.
Some possible implementations (Try it online!):
def subseq(l1, l2):
it1 = iter(l1)
return all(any(x in pool for x in it1)
for pool in l2)
def subseq(l1, l2):
it1 = iter(l1)
return all(not pool.isdisjoint(it1)
for pool in l2)
def subseq(l1, l2):
it1 = iter(l1)
return not any(pool.isdisjoint(it1)
for pool in l2)
def subseq(l1, l2):
return not any(map(set.isdisjoint, l2, repeat(iter(l1))))
def subseq(l1, l2):
it1 = iter(l1)
for pool in l2:
for x in it1:
if x in pool:
break
else:
return
return True
Testing with
l1 = [0, 1, 2, 3, ..., 39]
l2 = [{0, 1}, {2, 3}, ..., {40, 41}]
Times:
5254.183 ms subseq_product
0.020 ms subseq_all_any
0.006 ms subseq_all_not_disjoint
0.005 ms subseq_not_any_disjoint
0.005 ms subseq_map_disjoint
0.003 ms subseq_loops
1279.553 ms subseq_Alain
0.018 ms subseq_Alain_faster
subseq_product
does the idea from the question, going through all 221 possibilities of l2
, doing an ordinary subsequence check for each.subseq_Alain
tries 220 (?) recursive paths. The _faster
version greedily takes each l2
-element's first match and doesn't try further.Trying the fast solutions with longer inputs:
l1 = [0, 1, 2, 3, ..., 999]
l2 = [{0, 1}, {2, 3}, ..., {1000, 1001}]
Times:
0.285 ms subseq_all_any
0.058 ms subseq_all_not_disjoint
0.057 ms subseq_not_any_disjoint
0.031 ms subseq_map_disjoint
0.044 ms subseq_loops
1.488 ms subseq_Alain_faster
Full benchmark code (Try it online!):
from timeit import timeit
from itertools import product, repeat
def subseq_product(l1, l2):
def is_subseq(x, y):
# Ordinary subsequence test,
# see https://stackoverflow.com/a/24017747
it = iter(y)
return all(c in it for c in x)
return any(is_subseq(p2, l1)
for p2 in product(*l2))
def subseq_all_any(l1, l2):
it1 = iter(l1)
return all(any(x in pool for x in it1)
for pool in l2)
def subseq_all_not_disjoint(l1, l2):
it1 = iter(l1)
return all(not pool.isdisjoint(it1)
for pool in l2)
def subseq_not_any_disjoint(l1, l2):
it1 = iter(l1)
return not any(pool.isdisjoint(it1)
for pool in l2)
def subseq_map_disjoint(l1, l2):
return not any(map(set.isdisjoint, l2, repeat(iter(l1))))
def subseq_loops(l1, l2):
it1 = iter(l1)
for pool in l2:
for x in it1:
if x in pool:
break
else:
return
return True
def subseq_Alain(A, S):
if not S: return True # all sets matched
for i,n in enumerate(A): # check for membership in 1st set
if n in S[0] and subseq_Alain(A[i+1:],S[1:]): # and matching rest
return True # found full match
return False
def subseq_Alain_faster(A, S):
if not S: return True # all sets matched
for i,n in enumerate(A): # check for membership in 1st set
if n in S[0]:
return subseq_Alain_faster(A[i+1:],S[1:]) # and matching rest
return False
def benchmark(funcs, args, number):
for _ in range(3):
for func in funcs:
t = timeit(lambda: func(*args), number=number) / number
print('%8.3f ms ' % (t * 1e3), func.__name__)
print()
l1 = list(range(40))
l2 = [{i, i+1} for i in range(0, 42, 2)]
funcs = [
subseq_product,
subseq_all_any,
subseq_all_not_disjoint,
subseq_not_any_disjoint,
subseq_map_disjoint,
subseq_loops,
subseq_Alain,
subseq_Alain_faster,
]
benchmark(funcs, (l1, l2), 1)
l1 = list(range(1000))
l2 = [{i, i+1} for i in range(0, 1002, 2)]
funcs = [
subseq_all_any,
subseq_all_not_disjoint,
subseq_not_any_disjoint,
subseq_map_disjoint,
subseq_loops,
subseq_Alain_faster,
]
benchmark(funcs, (l1, l2), 500)
Upvotes: 3
Reputation: 27629
Instead of just knowing whether there's a match, one might want to also know what that match is. Maybe to truly use it, or maybe just to be able to verify the correctness. Here are a few versions, returning a list of the actual matched elements, or None
if there's no match. The first two might be the best.
Perhaps the simplest, assume each pool from l2
does have a match in l1
, and catch the exception:
def subseq1(l1, l2):
it1 = iter(l1)
try:
return [next(x for x in it1 if x in pool)
for pool in l2]
except StopIteration:
pass
Basic loops:
def subseq2(l1, l2):
witness = []
it1 = iter(l1)
for pool in l2:
for x in it1:
if x in pool:
witness.append(x)
break
else:
return
return witness
Slightly modifying my original if(all(any(...
solution to append each witnessing element:
def subseq3(l1, l2):
witness = []
it1 = iter(l1)
if all(any(x in pool and not witness.append(x)
for x in it1)
for pool in l2):
return witness
Pre-append a spot for the next witness element:
def subseq4(l1, l2):
witness = []
it1 = iter(l1)
if all(any(witness[-1] in pool
for witness[-1] in it1)
for pool in l2
if not witness.append(None)):
return witness
Pre-allocate spots for all witness elements:
def subseq5(l1, l2):
witness = [None] * len(l2)
it1 = iter(l1)
if all(any(witness[i] in pool
for witness[i] in it1)
for i, pool in enumerate(l2)):
return witness
Check for false witnesses at the end:
def subseq6(l1, l2):
it1 = iter(l1)
false = object()
witness = [next((x for x in it1 if x in pool), false)
for pool in l2]
if false not in witness:
return witness
Testing code (cases taken from Alain's answer):
funcs = [
subseq1,
subseq2,
subseq3,
subseq4,
subseq5,
subseq6,
]
for func in funcs:
l1 = [1, 2, 3, 4]
l2 = [{2, 1}, {1, 3}]
print(func(l1,l2)) # True
l1 = [1, 2, 3, 4]
l2 = [{2, 1}, {1, 3}, {3, 4}]
print(func(l1,l2)) # True
l1 = [1, 2, 4, 3]
l2 = [{2, 1}, {1, 3}, {3, 4}]
print(func(l1,l2)) # False
print()
Upvotes: 1
Reputation: 42133
If your list of sets can have more than two items, perhaps a recursive function would be best:
def subMatch(A,S):
if not S: return True # all sets matched
for i,n in enumerate(A): # check for membership in 1st set
if n in S[0] and subMatch(A[i+1:],S[1:]): # and matching rest
return True # found full match
return False
Output:
l1 = [1, 2, 3, 4]
l2 = [{2, 1}, {1, 3}]
print(subMatch(l1,l2)) # True
l1 = [1, 2, 3, 4]
l2 = [{2, 1}, {1, 3}, {3, 4}]
print(subMatch(l1,l2)) # True
l1 = [1, 2, 4, 3]
l2 = [{2, 1}, {1, 3}, {3, 4}]
print(subMatch(l1,l2)) # False
Upvotes: 1