marlon
marlon

Reputation: 7683

Subsequence match for a simple list and a list of sets?

This is a problem involving subsequences. The ordinary subsequence problem has a list l1 and asks whether another list l2 is a subsequence. For example:

l1 = [3, 1, 4, 1, 5, 9]

l2 = [3, 1, 5]            => True (remove 4, 1, 9)

l2 = [4, 3, 5]            => False

In my problem, l2 instead has sets of values. For example:

l1 = [3, 1, 4, 1, 5, 9]
l2 = [{2, 3}, {1, 5, 2}, {9}]
=> True, because:
     {2, 3} matches 3
     {1, 5, 2} matches 1 (or 5)
     {9} matches 9

Here, l2 can be thought of conceptually expanded to different possibilities by taking one element from each set:

l2exp = [[2, 1, 9], [2, 5, 9], [2, 2, 9], [3, 1, 9], [3, 5, 9], [3, 2, 9]]

which means as long as one of those six possible lists represented by l2 matches l1, we have a successful match. Since [3, 1, 9] matches l1 , the whole l2 matches.

So one solution may be first flatten the l2 into the new l2exp like above, and then for each sublist_l2 in l2exp, the ordinary subsequence check all(e in iter_of_l1 for e in sublist_l2) can be used.

How to do this match without explicitly expanding the l2 into a list of lists?

Upvotes: 1

Views: 562

Answers (3)

Kelly Bundy
Kelly Bundy

Reputation: 27629

Solutions

Instead of checking equality of the l1 and l2 elements, check for membership of the l1 elements in the l2 elements.

Some possible implementations (Try it online!):

def subseq(l1, l2):
    it1 = iter(l1)
    return all(any(x in pool for x in it1)
               for pool in l2)

def subseq(l1, l2):
    it1 = iter(l1)
    return all(not pool.isdisjoint(it1)
               for pool in l2)

def subseq(l1, l2):
    it1 = iter(l1)
    return not any(pool.isdisjoint(it1)
                   for pool in l2)

def subseq(l1, l2):
    return not any(map(set.isdisjoint, l2, repeat(iter(l1))))

def subseq(l1, l2):
    it1 = iter(l1)
    for pool in l2:
        for x in it1:
            if x in pool:
                break
        else:
            return
    return True

Benchmarks

Testing with

l1 = [0, 1, 2, 3, ..., 39]
l2 = [{0, 1}, {2, 3}, ..., {40, 41}]

Times:

5254.183 ms  subseq_product
   0.020 ms  subseq_all_any
   0.006 ms  subseq_all_not_disjoint
   0.005 ms  subseq_not_any_disjoint
   0.005 ms  subseq_map_disjoint
   0.003 ms  subseq_loops
1279.553 ms  subseq_Alain
   0.018 ms  subseq_Alain_faster
  • subseq_product does the idea from the question, going through all 221 possibilities of l2, doing an ordinary subsequence check for each.
  • subseq_Alain tries 220 (?) recursive paths. The _faster version greedily takes each l2-element's first match and doesn't try further.

Trying the fast solutions with longer inputs:

l1 = [0, 1, 2, 3, ..., 999]
l2 = [{0, 1}, {2, 3}, ..., {1000, 1001}]

Times:

   0.285 ms  subseq_all_any
   0.058 ms  subseq_all_not_disjoint
   0.057 ms  subseq_not_any_disjoint
   0.031 ms  subseq_map_disjoint
   0.044 ms  subseq_loops
   1.488 ms  subseq_Alain_faster

Full benchmark code (Try it online!):

from timeit import timeit
from itertools import product, repeat

def subseq_product(l1, l2):

    def is_subseq(x, y):
        # Ordinary subsequence test,
        # see https://stackoverflow.com/a/24017747
        it = iter(y)
        return all(c in it for c in x)

    return any(is_subseq(p2, l1)
               for p2 in product(*l2))

def subseq_all_any(l1, l2):
    it1 = iter(l1)
    return all(any(x in pool for x in it1)
               for pool in l2)

def subseq_all_not_disjoint(l1, l2):
    it1 = iter(l1)
    return all(not pool.isdisjoint(it1)
               for pool in l2)

def subseq_not_any_disjoint(l1, l2):
    it1 = iter(l1)
    return not any(pool.isdisjoint(it1)
                   for pool in l2)

def subseq_map_disjoint(l1, l2):
    return not any(map(set.isdisjoint, l2, repeat(iter(l1))))

def subseq_loops(l1, l2):
    it1 = iter(l1)
    for pool in l2:
        for x in it1:
            if x in pool:
                break
        else:
            return
    return True

def subseq_Alain(A, S):
    if not S: return True          # all sets matched
    for i,n in enumerate(A):       # check for membership in 1st set    
        if n in S[0] and subseq_Alain(A[i+1:],S[1:]): # and matching rest
            return True            # found full match
    return False

def subseq_Alain_faster(A, S):
    if not S: return True          # all sets matched
    for i,n in enumerate(A):       # check for membership in 1st set    
        if n in S[0]:
            return subseq_Alain_faster(A[i+1:],S[1:]) # and matching rest
    return False

def benchmark(funcs, args, number):
    for _ in range(3):
        for func in funcs:
            t = timeit(lambda: func(*args), number=number) / number
            print('%8.3f ms ' % (t * 1e3), func.__name__)
        print()

l1 = list(range(40))
l2 = [{i, i+1} for i in range(0, 42, 2)]
funcs = [
    subseq_product,
    subseq_all_any,
    subseq_all_not_disjoint,
    subseq_not_any_disjoint,
    subseq_map_disjoint,
    subseq_loops,
    subseq_Alain,
    subseq_Alain_faster,
]
benchmark(funcs, (l1, l2), 1)

l1 = list(range(1000))
l2 = [{i, i+1} for i in range(0, 1002, 2)]
funcs = [
    subseq_all_any,
    subseq_all_not_disjoint,
    subseq_not_any_disjoint,
    subseq_map_disjoint,
    subseq_loops,
    subseq_Alain_faster,
]
benchmark(funcs, (l1, l2), 500)

Upvotes: 3

Kelly Bundy
Kelly Bundy

Reputation: 27629

Witness Versions

Instead of just knowing whether there's a match, one might want to also know what that match is. Maybe to truly use it, or maybe just to be able to verify the correctness. Here are a few versions, returning a list of the actual matched elements, or None if there's no match. The first two might be the best.

Perhaps the simplest, assume each pool from l2 does have a match in l1, and catch the exception:

def subseq1(l1, l2):
    it1 = iter(l1)
    try:
        return [next(x for x in it1 if x in pool)
                for pool in l2]
    except StopIteration:
        pass

Basic loops:

def subseq2(l1, l2):
    witness = []
    it1 = iter(l1)
    for pool in l2:
        for x in it1:
            if x in pool:
                witness.append(x)
                break
        else:
            return
    return witness

Slightly modifying my original if(all(any(... solution to append each witnessing element:

def subseq3(l1, l2):
    witness = []
    it1 = iter(l1)
    if all(any(x in pool and not witness.append(x)
               for x in it1)
           for pool in l2):
        return witness

Pre-append a spot for the next witness element:

def subseq4(l1, l2):
    witness = []
    it1 = iter(l1)
    if all(any(witness[-1] in pool
               for witness[-1] in it1)
           for pool in l2
           if not witness.append(None)):
        return witness

Pre-allocate spots for all witness elements:

def subseq5(l1, l2):
    witness = [None] * len(l2)
    it1 = iter(l1)
    if all(any(witness[i] in pool
               for witness[i] in it1)
           for i, pool in enumerate(l2)):
        return witness

Check for false witnesses at the end:

def subseq6(l1, l2):
    it1 = iter(l1)
    false = object()
    witness = [next((x for x in it1 if x in pool), false)
               for pool in l2]
    if false not in witness:
        return witness

Testing code (cases taken from Alain's answer):

funcs = [
    subseq1,
    subseq2,
    subseq3,
    subseq4,
    subseq5,
    subseq6,
]

for func in funcs:
    l1 = [1, 2, 3, 4]
    l2 = [{2, 1}, {1, 3}]
    print(func(l1,l2)) # True

    l1 = [1, 2, 3, 4]
    l2 = [{2, 1}, {1, 3}, {3, 4}]
    print(func(l1,l2)) # True

    l1 = [1, 2, 4, 3]
    l2 = [{2, 1}, {1, 3}, {3, 4}]
    print(func(l1,l2)) # False

    print()

Upvotes: 1

Alain T.
Alain T.

Reputation: 42133

If your list of sets can have more than two items, perhaps a recursive function would be best:

def subMatch(A,S):
    if not S: return True          # all sets matched
    for i,n in enumerate(A):       # check for membership in 1st set    
        if n in S[0] and subMatch(A[i+1:],S[1:]): # and matching rest
            return True            # found full match
    return False

Output:

l1 = [1, 2, 3, 4]

l2 = [{2, 1}, {1, 3}]

print(subMatch(l1,l2)) # True

l1 = [1, 2, 3, 4]

l2 = [{2, 1}, {1, 3}, {3, 4}]

print(subMatch(l1,l2)) # True

l1 = [1, 2, 4, 3]

l2 = [{2, 1}, {1, 3}, {3, 4}]

print(subMatch(l1,l2)) # False

Upvotes: 1

Related Questions