Reputation: 3534
I want to write a function that determines if a sublist exists in a larger list.
list1 = [1,0,1,1,1,0,0]
list2 = [1,0,1,0,1,0,1]
#Should return true
sublistExists(list1, [1,1,1])
#Should return false
sublistExists(list2, [1,1,1])
Is there a Python function that can do this?
Upvotes: 43
Views: 28691
Reputation: 15374
The function proposed in Nas Banov's answer has a fundamental issue: it creates many lists that are only used to compare portions of the original list.
Specifically, when one writes
(sublst == lst[i: i + n]) for i in range(len(lst) - n + 1)
then len(lst) - n + 1
lists of length n
are created (in the worst case scenario, i.e. when the sublist can't be found in the original list).
There are cases in which this approach could become very slow, especially when the lists to be created are large (i.e., when the sublist is large). For such cases, this implementation is much faster:
def is_sublist(sub_lst, lst):
n = len(sub_lst)
return any(
all(lst[i + j] == sub_lst[j] for j in range(n))
for i in range(len(lst) - n + 1)
)
Let's do a comparison of these two functions (is_sublist_a
is the one originally proposed by Nas Banov, is_sublist_b
is the one I'm proposing):
def is_sublist_a(sub_lst, lst):
n = len(sub_lst)
return any(
sub_lst == lst[i: i + n]
for i in range(len(lst) - n + 1)
)
def is_sublist_b(sub_lst, lst):
n = len(sub_lst)
return any(
all(lst[i + j] == sub_lst[j] for j in range(n))
for i in range(len(lst) - n + 1)
)
Let's have a look at the worst case scenario (the sublist does not exist). This is the function I use to measure time that a function needs to return the result:
from time import time
def exec_time(is_sublist, lst, sub_lst):
start = time()
_ = is_sublist(sub_lst, lst)
return time() - start
We can see that, for small sublists, the first function is faster:
>>> exec_time(is_sublist_a, [0] * 10**7, [1] * 10)
1.7239012718200684
>>> exec_time(is_sublist_a, [0] * 10**7, [1] * 30)
2.3223540782928467
>>> exec_time(is_sublist_a, [0] * 10**7, [1] * 50)
3.017274856567383
>>> exec_time(is_sublist_b, [0] * 10**7, [1] * 10)
5.492832899093628
>>> exec_time(is_sublist_b, [0] * 10**7, [1] * 30)
5.4729719161987305
>>> exec_time(is_sublist_b, [0] * 10**7, [1] * 50)
5.4685280323028564
As you can easily notice, function is_sublist_a
is faster. But you can also notice that the speed of function is_sublist_b
does not depend on the length of the sublist, while is_sublist_a
's speed does.
So it's easy to show that is_sublist_b
is much faster than is_sublist_a
for larger sublists:
>>> exec_time(is_sublist_a, [0] * 10**7, [1] * 500)
15.868159055709839
>>> exec_time(is_sublist_a, [0] * 10**7, [1] * 1000)
29.75873899459839
>>> exec_time(is_sublist_b, [0] * 10**7, [1] * 500)
5.8182408809661865
>>> exec_time(is_sublist_b, [0] * 10**7, [1] * 1000)
6.155586004257202
Upvotes: 0
Reputation: 29019
Let's get a bit functional, shall we? :)
def contains_sublist(lst, sublst):
n = len(sublst)
return any((sublst == lst[i:i+n]) for i in range(len(lst)-n+1))
Note that any()
will stop on first match of sublst within lst - or fail if there is no match, after O(m*n) ops
Upvotes: 47
Reputation:
The efficient way to do this is to use the Boyer-Moore algorithm, as Mark Byers suggests. I have done it already here: Boyer-Moore search of a list for a sub-list in Python, but will paste the code here. It's based on the Wikipedia article.
The search()
function returns the index of the sub-list being searched for, or -1 on failure.
def search(haystack, needle):
"""
Search list `haystack` for sublist `needle`.
"""
if len(needle) == 0:
return 0
char_table = make_char_table(needle)
offset_table = make_offset_table(needle)
i = len(needle) - 1
while i < len(haystack):
j = len(needle) - 1
while needle[j] == haystack[i]:
if j == 0:
return i
i -= 1
j -= 1
i += max(offset_table[len(needle) - 1 - j], char_table.get(haystack[i]));
return -1
def make_char_table(needle):
"""
Makes the jump table based on the mismatched character information.
"""
table = {}
for i in range(len(needle) - 1):
table[needle[i]] = len(needle) - 1 - i
return table
def make_offset_table(needle):
"""
Makes the jump table based on the scan offset in which mismatch occurs.
"""
table = []
last_prefix_position = len(needle)
for i in reversed(range(len(needle))):
if is_prefix(needle, i + 1):
last_prefix_position = i + 1
table.append(last_prefix_position - i + len(needle) - 1)
for i in range(len(needle) - 1):
slen = suffix_length(needle, i)
table[slen] = len(needle) - 1 - i + slen
return table
def is_prefix(needle, p):
"""
Is needle[p:end] a prefix of needle?
"""
j = 0
for i in range(p, len(needle)):
if needle[i] != needle[j]:
return 0
j += 1
return 1
def suffix_length(needle, p):
"""
Returns the maximum length of the substring ending at p that is a suffix.
"""
length = 0;
j = len(needle) - 1
for i in reversed(range(p + 1)):
if needle[i] == needle[j]:
length += 1
else:
break
j -= 1
return length
Here is the example from the question:
def main():
list1 = [1,0,1,1,1,0,0]
list2 = [1,0,1,0,1,0,1]
index = search(list1, [1, 1, 1])
print(index)
index = search(list2, [1, 1, 1])
print(index)
if __name__ == '__main__':
main()
Output:
2
-1
Upvotes: 4
Reputation: 2948
I know this might not be quite relevant to the original question but it might be very elegant 1 line solution to someone else if the sequence of items in both lists doesn't matter. The result below will show True if List1 elements are in List2 (regardless of order). If the order matters then don't use this solution.
List1 = [10, 20, 30]
List2 = [10, 20, 30, 40]
result = set(List1).intersection(set(List2)) == set(List1)
print(result)
Output
True
Upvotes: -2
Reputation: 508
My favourite simple solution is following (however, its brutal-force, so i dont recommend it on huge data):
>>> l1 = ['z','a','b','c']
>>> l2 = ['a','b']
>>>any(l1[i:i+len(l2)] == l2 for i in range(len(l1)))
True
This code above actually creates all possible slices of l1 with length of l2, and sequentially compares them with l2.
Read this explanation only if you dont understand how it works (and you want to know it), otherwise there is no need to read it
Firstly, this is how you can iterate over indexes of l1 items:
>>> [i for i in range(len(l1))]
[0, 1, 2, 3]
So, because i is representing index of item in l1, you can use it to show that actuall item, instead of index number:
>>> [l1[i] for i in range(len(l1))]
['z', 'a', 'b', 'c']
Then create slices (something like subselection of items from list) from l1 with length of2:
>>> [l1[i:i+len(l2)] for i in range(len(l1))]
[['z', 'a'], ['a', 'b'], ['b', 'c'], ['c']] #last one is shorter, because there is no next item.
Now you can compare each slice with l2 and you see that second one matched:
>>> [l1[i:i+len(l2)] == l2 for i in range(len(l1))]
[False, True, False, False] #notice that the second one is that matching one
Finally, with function named any, you can check if at least one of booleans is True:
>>> any(l1[i:i+len(l2)] == l2 for i in range(len(l1)))
True
Upvotes: 4
Reputation: 295
def sublist(l1,l2):
if len(l1) < len(l2):
for i in range(0, len(l1)):
for j in range(0, len(l2)):
if l1[i]==l2[j] and j==i+1:
pass
return True
else:
return False
Upvotes: 0
Reputation: 23763
Might as well throw in a recursive version of @NasBanov's solution
def foo(sub, lst):
'''Checks if sub is in lst.
Expects both arguments to be lists
'''
if len(lst) < len(sub):
return False
return sub == lst[:len(sub)] or foo(sub, lst[1:])
Upvotes: 0
Reputation: 27476
def sublistExists(x, y):
occ = [i for i, a in enumerate(x) if a == y[0]]
for b in occ:
if x[b:b+len(y)] == y:
print 'YES-- SUBLIST at : ', b
return True
if len(occ)-1 == occ.index(b):
print 'NO SUBLIST'
return False
list1 = [1,0,1,1,1,0,0]
list2 = [1,0,1,0,1,0,1]
#should return True
sublistExists(list1, [1,1,1])
#Should return False
sublistExists(list2, [1,1,1])
Upvotes: 1
Reputation: 838706
If you are sure that your inputs will only contain the single digits 0 and 1 then you can convert to strings:
def sublistExists(list1, list2):
return ''.join(map(str, list2)) in ''.join(map(str, list1))
This creates two strings so it is not the most efficient solution but since it takes advantage of the optimized string searching algorithm in Python it's probably good enough for most purposes.
If efficiency is very important you can look at the Boyer-Moore string searching algorithm, adapted to work on lists.
A naive search has O(n*m) worst case but can be suitable if you cannot use the converting to string trick and you don't need to worry about performance.
Upvotes: 22
Reputation: 2959
if iam understanding this correctly, you have a larger list, like :
list_A= ['john', 'jeff', 'dave', 'shane', 'tim']
then there are other lists
list_B= ['sean', 'bill', 'james']
list_C= ['cole', 'wayne', 'jake', 'moose']
and then i append the lists B and C to list A
list_A.append(list_B)
list_A.append(list_C)
so when i print list_A
print (list_A)
i get the following output
['john', 'jeff', 'dave', 'shane', 'tim', ['sean', 'bill', 'james'], ['cole', 'wayne', 'jake', 'moose']]
now that i want to check if the sublist exists:
for value in list_A:
value= type(value)
value= str(value).strip('<>').split()[1]
if (value == "'list'"):
print "True"
else:
print "False"
this will give you 'True' if you have any sublist inside the larger list.
Upvotes: -4
Reputation: 304355
Here is a way that will work for simple lists that is slightly less fragile than Mark's
def sublistExists(haystack, needle):
def munge(s):
return ", "+format(str(s)[1:-1])+","
return munge(needle) in munge(haystack)
Upvotes: 1
Reputation: 4840
No function that I know of
def sublistExists(list, sublist):
for i in range(len(list)-len(sublist)+1):
if sublist == list[i:i+len(sublist)]:
return True #return position (i) if you wish
return False #or -1
As Mark noted, this is not the most efficient search (it's O(n*m)). This problem can be approached in much the same way as string searching.
Upvotes: 4