Reputation: 591
I'm trying to write a python script to find patterns in a list.
Eg. Given this list
[1,2,3,4,5,6,4,5,6,4,5,6,4,5,6]
The script would determine that 4,5,6 ocurred 3 times and then print out
3 (4,5,6)
I was hoping if anyone had any insights algorithmically (I can only think of n^2 algorithms where I check for patterns of size 1, then 2, then 3, etc iterating through the string each time) or if there might be any Python built-in libraries which might help do the same things. Thanks!
Upvotes: 3
Views: 17106
Reputation: 466
Here is a function providing a solution to the pattern matching problem:
import itertools
def pattern_match(pattern, sequence):
"""Count the number of times that pattern occurs in the sequence."""
pattern = tuple(pattern)
k = len(pattern)
# create k iterators for the sequence
i = itertools.tee(sequence, k)
# advance the iterators
for j in range(k):
for _ in range(j):
next(i[j])
count = 0
for q in zip(*i):
if pattern == q:
count += 1
return count
To solve the stated problem, call with:
p = [4, 5, 6]
l = [1, 2, 3, 4, 5, 6, 4, 5, 6, 4, 5, 6, 4, 5, 6]
count = pattern_match(p, l)
Here is a Gist with the complete code solving the example problem.
(I believe the correct answer is that the pattern repeats 4 times, not 3 times, as stated in the question.)
I'm not sure if the complexity of this algorithm is actually less than O(n^2).
Upvotes: 2
Reputation: 11644
Off the top of my head, I would do this:
Upvotes: 1
Reputation:
The algorithm you are looking for is Run Length Encoding. The basic principles of that algorithm will give you how to approach detecting patterns in a sequence and counting them.
Run-length encoding (RLE) is a very simple form of data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run.
Here is a relevant article on how to write an RLE program in Python.
Upvotes: 1