Jonathan
Jonathan

Reputation: 3534

Check for presence of a sliced list in Python

I want to write a function that determines if a sublist exists in a larger list.

list1 = [1,0,1,1,1,0,0]
list2 = [1,0,1,0,1,0,1]

#Should return true
sublistExists(list1, [1,1,1])

#Should return false
sublistExists(list2, [1,1,1])

Is there a Python function that can do this?

Upvotes: 43

Views: 28691

Answers (12)

Riccardo Bucco
Riccardo Bucco

Reputation: 15374

The function proposed in Nas Banov's answer has a fundamental issue: it creates many lists that are only used to compare portions of the original list.

Specifically, when one writes

(sublst == lst[i: i + n]) for i in range(len(lst) - n + 1)

then len(lst) - n + 1 lists of length n are created (in the worst case scenario, i.e. when the sublist can't be found in the original list).

There are cases in which this approach could become very slow, especially when the lists to be created are large (i.e., when the sublist is large). For such cases, this implementation is much faster:

def is_sublist(sub_lst, lst):
    n = len(sub_lst)
    return any(
        all(lst[i + j] == sub_lst[j] for j in range(n))
        for i in range(len(lst) - n + 1)
    )

Let's do a comparison of these two functions (is_sublist_a is the one originally proposed by Nas Banov, is_sublist_b is the one I'm proposing):

def is_sublist_a(sub_lst, lst):
    n = len(sub_lst)
    return any(
        sub_lst == lst[i: i + n]
        for i in range(len(lst) - n + 1)
    )

def is_sublist_b(sub_lst, lst):
    n = len(sub_lst)
    return any(
        all(lst[i + j] == sub_lst[j] for j in range(n))
        for i in range(len(lst) - n + 1)
    )

Let's have a look at the worst case scenario (the sublist does not exist). This is the function I use to measure time that a function needs to return the result:

from time import time

def exec_time(is_sublist, lst, sub_lst):
    start = time()
    _ = is_sublist(sub_lst, lst)
    return time() - start

We can see that, for small sublists, the first function is faster:

>>> exec_time(is_sublist_a, [0] * 10**7, [1] * 10)
1.7239012718200684
>>> exec_time(is_sublist_a, [0] * 10**7, [1] * 30)
2.3223540782928467
>>> exec_time(is_sublist_a, [0] * 10**7, [1] * 50)
3.017274856567383

>>> exec_time(is_sublist_b, [0] * 10**7, [1] * 10)
5.492832899093628
>>> exec_time(is_sublist_b, [0] * 10**7, [1] * 30)
5.4729719161987305
>>> exec_time(is_sublist_b, [0] * 10**7, [1] * 50)
5.4685280323028564

As you can easily notice, function is_sublist_a is faster. But you can also notice that the speed of function is_sublist_b does not depend on the length of the sublist, while is_sublist_a's speed does.

So it's easy to show that is_sublist_b is much faster than is_sublist_a for larger sublists:

>>> exec_time(is_sublist_a, [0] * 10**7, [1] * 500)
15.868159055709839
>>> exec_time(is_sublist_a, [0] * 10**7, [1] * 1000)
29.75873899459839

>>> exec_time(is_sublist_b, [0] * 10**7, [1] * 500)
5.8182408809661865
>>> exec_time(is_sublist_b, [0] * 10**7, [1] * 1000)
6.155586004257202

Upvotes: 0

Nas Banov
Nas Banov

Reputation: 29019

Let's get a bit functional, shall we? :)

def contains_sublist(lst, sublst):
    n = len(sublst)
    return any((sublst == lst[i:i+n]) for i in range(len(lst)-n+1))

Note that any() will stop on first match of sublst within lst - or fail if there is no match, after O(m*n) ops

Upvotes: 47

user325117
user325117

Reputation:

The efficient way to do this is to use the Boyer-Moore algorithm, as Mark Byers suggests. I have done it already here: Boyer-Moore search of a list for a sub-list in Python, but will paste the code here. It's based on the Wikipedia article.

The search() function returns the index of the sub-list being searched for, or -1 on failure.

def search(haystack, needle):
    """
    Search list `haystack` for sublist `needle`.
    """
    if len(needle) == 0:
        return 0
    char_table = make_char_table(needle)
    offset_table = make_offset_table(needle)
    i = len(needle) - 1
    while i < len(haystack):
        j = len(needle) - 1
        while needle[j] == haystack[i]:
            if j == 0:
                return i
            i -= 1
            j -= 1
        i += max(offset_table[len(needle) - 1 - j], char_table.get(haystack[i]));
    return -1

    
def make_char_table(needle):
    """
    Makes the jump table based on the mismatched character information.
    """
    table = {}
    for i in range(len(needle) - 1):
        table[needle[i]] = len(needle) - 1 - i
    return table
    
def make_offset_table(needle):
    """
    Makes the jump table based on the scan offset in which mismatch occurs.
    """
    table = []
    last_prefix_position = len(needle)
    for i in reversed(range(len(needle))):
        if is_prefix(needle, i + 1):
            last_prefix_position = i + 1
        table.append(last_prefix_position - i + len(needle) - 1)
    for i in range(len(needle) - 1):
        slen = suffix_length(needle, i)
        table[slen] = len(needle) - 1 - i + slen
    return table
    
def is_prefix(needle, p):
    """
    Is needle[p:end] a prefix of needle?
    """
    j = 0
    for i in range(p, len(needle)):
        if needle[i] != needle[j]:
            return 0
        j += 1    
    return 1
    
def suffix_length(needle, p):
    """
    Returns the maximum length of the substring ending at p that is a suffix.
    """
    length = 0;
    j = len(needle) - 1
    for i in reversed(range(p + 1)):
        if needle[i] == needle[j]:
            length += 1
        else:
            break
        j -= 1
    return length

Here is the example from the question:

def main():
    list1 = [1,0,1,1,1,0,0]
    list2 = [1,0,1,0,1,0,1]
    index = search(list1, [1, 1, 1])
    print(index)
    index = search(list2, [1, 1, 1])
    print(index)

if __name__ == '__main__':
    main()

Output:

2
-1

Upvotes: 4

Chadee Fouad
Chadee Fouad

Reputation: 2948

I know this might not be quite relevant to the original question but it might be very elegant 1 line solution to someone else if the sequence of items in both lists doesn't matter. The result below will show True if List1 elements are in List2 (regardless of order). If the order matters then don't use this solution.

List1 = [10, 20, 30]
List2 = [10, 20, 30, 40]
result = set(List1).intersection(set(List2)) == set(List1)
print(result)

Output

True

Upvotes: -2

Jan Musil
Jan Musil

Reputation: 508

My favourite simple solution is following (however, its brutal-force, so i dont recommend it on huge data):

>>> l1 = ['z','a','b','c']
>>> l2 = ['a','b']
>>>any(l1[i:i+len(l2)] == l2 for i in range(len(l1)))
True

This code above actually creates all possible slices of l1 with length of l2, and sequentially compares them with l2.

Detailed explanation

Read this explanation only if you dont understand how it works (and you want to know it), otherwise there is no need to read it

Firstly, this is how you can iterate over indexes of l1 items:

>>> [i for i in range(len(l1))]
[0, 1, 2, 3]

So, because i is representing index of item in l1, you can use it to show that actuall item, instead of index number:

>>> [l1[i] for i in range(len(l1))]
['z', 'a', 'b', 'c']

Then create slices (something like subselection of items from list) from l1 with length of2:

>>> [l1[i:i+len(l2)] for i in range(len(l1))]
[['z', 'a'], ['a', 'b'], ['b', 'c'], ['c']] #last one is shorter, because there is no next item.

Now you can compare each slice with l2 and you see that second one matched:

>>> [l1[i:i+len(l2)] == l2 for i in range(len(l1))]
[False, True, False, False] #notice that the second one is that matching one

Finally, with function named any, you can check if at least one of booleans is True:

>>> any(l1[i:i+len(l2)] == l2 for i in range(len(l1)))
True

Upvotes: 4

Ashutosh K Singh
Ashutosh K Singh

Reputation: 295

def sublist(l1,l2):
  if len(l1) < len(l2):
    for i in range(0, len(l1)):
      for j in range(0, len(l2)):
        if l1[i]==l2[j] and j==i+1:
        pass
      return True
  else:
    return False

Upvotes: 0

wwii
wwii

Reputation: 23763

Might as well throw in a recursive version of @NasBanov's solution

def foo(sub, lst):
    '''Checks if sub is in lst.

    Expects both arguments to be lists
    '''
    if len(lst) < len(sub):
        return False
    return sub == lst[:len(sub)] or foo(sub, lst[1:])

Upvotes: 0

SuperNova
SuperNova

Reputation: 27476

def sublistExists(x, y):
  occ = [i for i, a in enumerate(x) if a == y[0]]
  for b in occ:
      if x[b:b+len(y)] == y:
           print 'YES-- SUBLIST at : ', b
           return True
      if len(occ)-1 ==  occ.index(b):
           print 'NO SUBLIST'
           return False

list1 = [1,0,1,1,1,0,0]
list2 = [1,0,1,0,1,0,1]

#should return True
sublistExists(list1, [1,1,1])

#Should return False
sublistExists(list2, [1,1,1])

Upvotes: 1

Mark Byers
Mark Byers

Reputation: 838706

If you are sure that your inputs will only contain the single digits 0 and 1 then you can convert to strings:

def sublistExists(list1, list2):
    return ''.join(map(str, list2)) in ''.join(map(str, list1))

This creates two strings so it is not the most efficient solution but since it takes advantage of the optimized string searching algorithm in Python it's probably good enough for most purposes.

If efficiency is very important you can look at the Boyer-Moore string searching algorithm, adapted to work on lists.

A naive search has O(n*m) worst case but can be suitable if you cannot use the converting to string trick and you don't need to worry about performance.

Upvotes: 22

Suhail
Suhail

Reputation: 2959

if iam understanding this correctly, you have a larger list, like :

list_A= ['john', 'jeff', 'dave', 'shane', 'tim']

then there are other lists

list_B= ['sean', 'bill', 'james']

list_C= ['cole', 'wayne', 'jake', 'moose']

and then i append the lists B and C to list A

list_A.append(list_B)

list_A.append(list_C)

so when i print list_A

print (list_A)

i get the following output

['john', 'jeff', 'dave', 'shane', 'tim', ['sean', 'bill', 'james'], ['cole', 'wayne', 'jake', 'moose']]

now that i want to check if the sublist exists:

for value in list_A:
    value= type(value)
    value= str(value).strip('<>').split()[1]
    if (value == "'list'"):
        print "True"
    else:
        print "False"

this will give you 'True' if you have any sublist inside the larger list.

Upvotes: -4

John La Rooy
John La Rooy

Reputation: 304355

Here is a way that will work for simple lists that is slightly less fragile than Mark's

def sublistExists(haystack, needle):
    def munge(s):
        return ", "+format(str(s)[1:-1])+","
    return munge(needle) in munge(haystack)

Upvotes: 1

sas4740
sas4740

Reputation: 4840

No function that I know of

def sublistExists(list, sublist):
    for i in range(len(list)-len(sublist)+1):
        if sublist == list[i:i+len(sublist)]:
            return True #return position (i) if you wish
    return False #or -1

As Mark noted, this is not the most efficient search (it's O(n*m)). This problem can be approached in much the same way as string searching.

Upvotes: 4

Related Questions