aerain
aerain

Reputation: 591

Finding patterns in a list

I'm trying to write a python script to find patterns in a list.

Eg. Given this list

[1,2,3,4,5,6,4,5,6,4,5,6,4,5,6]

The script would determine that 4,5,6 ocurred 3 times and then print out

3 (4,5,6)

I was hoping if anyone had any insights algorithmically (I can only think of n^2 algorithms where I check for patterns of size 1, then 2, then 3, etc iterating through the string each time) or if there might be any Python built-in libraries which might help do the same things. Thanks!

Upvotes: 3

Views: 17106

Answers (3)

jcbsv
jcbsv

Reputation: 466

Here is a function providing a solution to the pattern matching problem:

import itertools

def pattern_match(pattern, sequence):
    """Count the number of times that pattern occurs in the sequence."""
    pattern = tuple(pattern)
    k = len(pattern)

    # create k iterators for the sequence
    i = itertools.tee(sequence, k)

    # advance the iterators
    for j in range(k):
        for _ in range(j):
            next(i[j])

    count = 0
    for q in zip(*i):
        if pattern == q:
            count += 1

    return count

To solve the stated problem, call with:

p = [4, 5, 6]
l = [1, 2, 3, 4, 5, 6, 4, 5, 6, 4, 5, 6, 4, 5, 6]
count = pattern_match(p, l)

Here is a Gist with the complete code solving the example problem.

(I believe the correct answer is that the pattern repeats 4 times, not 3 times, as stated in the question.)

I'm not sure if the complexity of this algorithm is actually less than O(n^2).

Upvotes: 2

Luke
Luke

Reputation: 11644

Off the top of my head, I would do this:

  1. start with two copies of the list A and B.
  2. pop the first value off of B
  3. subtract B from A: C = A-B
  4. search for areas in C that are 0; these indicate repeated strings
  5. add repeated strings into a dict which tracks each string and the number of times it has been seen
  6. repeat steps 2-5 until B is empty.

Upvotes: 1

user177800
user177800

Reputation:

The algorithm you are looking for is Run Length Encoding. The basic principles of that algorithm will give you how to approach detecting patterns in a sequence and counting them.

Run-length encoding (RLE) is a very simple form of data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run.

Here is a relevant article on how to write an RLE program in Python.

Upvotes: 1

Related Questions