David Wolever
David Wolever

Reputation: 154564

Is there a Python builtin for determining if an iterable contained a certain sequence?

For example, something like:

>>> [1, 2, 3].contains_sequence([1, 2])
True
>>> [1, 2, 3].contains_sequence([4])
False

I know that the in operator can do this for strings:

>>> "12" in "123"
True

But I'm looking for something that operates on iterables.

Upvotes: 5

Views: 468

Answers (8)

Edward Loper
Edward Loper

Reputation: 15944

As others have said, there's no builtin for this. Here's an implementation that is potentially more efficient than the other answers I've seen -- in particular, it scans through the iterable, just keeping track of what prefix sizes of the target sequence it's seen. But that increased efficiency comes at some expense in increased verbosity over some of the other approaches that have been suggested.

def contains_seq(iterable, seq):
    """
    Returns true if the iterable contains the given sequence.
    """
    # The following clause is optional -- leave it if you want to allow `seq` to
    # be an arbitrary iterable; or remove it if `seq` will always be list-like.
    if not isinstance(seq, collections.Sequence):
        seq = tuple(seq)

    if len(seq)==0: return True # corner case

    partial_matches = []
    for elt in iterable:
        # Try extending each of the partial matches by adding the
        # next element, if it matches.
        partial_matches = [m+1 for m in partial_matches if elt == seq[m]]
        # Check if we should start a new partial match
        if elt==seq[0]:
            partial_matches.append(1)
        # Check if we have a complete match (partial_matches will always
        # be sorted from highest to lowest, since older partial matches 
        # come before newer ones).
        if partial_matches and partial_matches[0]==len(seq):
            return True
    # No match found.
    return False

Upvotes: 2

georg
georg

Reputation: 215009

deque appears to be useful here:

from collections import deque

def contains(it, seq):
    seq = deque(seq)
    deq = deque(maxlen=len(seq))
    for p in it:
        deq.append(p)
        if deq == seq:
            return True
    return False

Note that this accepts arbitrary iterables for both arguments (no slicing required).

Upvotes: 2

JBernardo
JBernardo

Reputation: 33407

As there's no builtin, I made a nice version:

import itertools as it

def contains(seq, sub):
    seq = iter(seq)
    o = object()
    return any(all(i==j for i,j in zip(sub, it.chain((n,),seq, 
                                      (o for i in it.count())))) for n in seq)

This do not require any extra lists (if you use it.izip or Py3k).

>>> contains([1,2,3], [1,2])
True
>>> contains([1,2,3], [1,2,3])
True
>>> contains([1,2,3], [2,3])
True
>>> contains([1,2,3], [2,3,4])
False

Extra points if you have no trouble reading it. (It does the job, but the implementation is not to be taked too seriously). ;)

Upvotes: 1

Aleksei astynax Pirogov
Aleksei astynax Pirogov

Reputation: 2491

If preserving of order is not necessary, you can use sets (builtin):

>>> set([1,2]).issubset([1,2,3])
True
>>> set([4]).issubset([1,2,3])
False

Otherwise:

def is_subsequence(sub, iterable):
    sub_pos, sub_len = 0, len(sub)
    for i in iterable:
        if i == sub[sub_pos]:
            sub_pos += 1
            if sub_pos >= sub_len:
                return True
        else:
            sub_pos = 0
    return False

>>> is_subsequence([1,2], [0,1,2,3,4])
True
>>> is_subsequence([2,1], [0,1,2,3,4]) # order preserved
False
>>> is_subsequence([1,2,4], [0,1,2,3,4])
False

This one works with any iterator.

Upvotes: 2

BrenBarn
BrenBarn

Reputation: 251428

Is there a Python builtin? No. You can accomplish this task in various ways. Here is a recipe that does it, and also gives you the position of the subsequence in the containing sequence:

def _search(forward, source, target, start=0, end=None):
    """Naive search for target in source."""
    m = len(source)
    n = len(target)
    if end is None:
        end = m
    else:
        end = min(end, m)
    if n == 0 or (end-start) < n:
        # target is empty, or longer than source, so obviously can't be found.
        return None
    if forward:
        x = range(start, end-n+1)
    else:
        x = range(end-n, start-1, -1)
    for i in x:
        if source[i:i+n] == target:
            return i
    return None

Upvotes: 3

mgilson
mgilson

Reputation: 310049

As far as I know, there's no way to do this. You can roll your own function pretty easily, but I doubt that will be terribly efficient.

>>> def contains_seq(seq,subseq):
...     #try: junk=seq[:]
...     #except: seq=tuple(seq)
...     #try: junk=subseq[:]
...     #except: subseq=tuple(subseq)
...     ll=len(subseq)
...     for i in range(len(seq)-ll):  #on python2, use xrange.
...         if(seq[i:i+ll] == subseq):
...             return True
...     return False
...
>>> contains_seq(range(10),range(3)) #True
>>> contains_seq(range(10),[2,3,6]) #False

Note that this solution does not work with generator type objects (it only works on objects that you can slice). You could check seq to see if it is sliceable before proceeding and cast to a tuple if it isn't sliceable -- But then you get rid of the benefits of slicing. You could re-write it to check one element at a time instead of using slicing, but I have a feeling performance would suffer even more.

Upvotes: 2

monkut
monkut

Reputation: 43860

Referenced from https://stackoverflow.com/a/6822773/24718 modified to use a list.

from itertools import islice

def window(seq, n=2):
    """
    Returns a sliding window (of width n) over data from the iterable
    s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   
    """
    it = iter(seq)
    result = list(islice(it, n))
    if len(result) == n:
        yield result    
    for elem in it:
        result = result[1:] + [elem]
        yield result

def contains_sequence(all_values, seq):
    return any(seq == current_seq for current_seq in window(all_values, len(seq)))            

test_iterable = [1,2,3]
search_sequence = [1,2]

result = contains_sequence(test_iterable, search_sequence)

Upvotes: 4

Yunchi
Yunchi

Reputation: 5537

You could convert it into a string and then do matching on it

full_list = " ".join([str(x) for x in [1, 2, 3]])
seq = " ".join([str(x) for x in [1, 2]])
seq in full_list

Upvotes: -2

Related Questions