Richard Riverlands
Richard Riverlands

Reputation: 33

Python: how to count the occurrence of specific pattern with overlap in a list or string?

My problem is logically easy, but hard to implement for me. I have a list of numbers,(or you could say a string of numbers, it is not hard to transfer between strings and lists) I'd like to count the occurrence of some specific patterns with overlap. For example, the code is below:

A = [0, 1, 2 ,4, 5, 8, 4, 4, 5, 8, 2, 4, 4, 5, 5, 8, 9, 10, 3, 2]

For "4,5,8" occurs, then I count a1 = 1, a2 = 1, a3 = 1. For "4,4,5,8" occurs, then I count a1 = 2, a2 = 1, a3 = 1. For "4,4,5,5,5,8,8,8" occurs, I count a1 = 2, a2 = 3, a3 = 3. That is to say, for a pattern, you count if the pattern at least include "4,5,8" in this order. "4,5,9" doesn't count. "4,4,4,5,5,2,8" doesn't count at all. For "4,5,4,5,8", a1 = 1, a2 = 1, a3 = 1.

Thank you all for your help.

Upvotes: 0

Views: 414

Answers (2)

Richard Riverlands
Richard Riverlands

Reputation: 33

I tried to use

re.findall(r'26+8',test_string)

This will output the substring like "266666668", "268", "26668" in a nonoverlapping way. However, what if I want to search if there were a pattern shown below: "2(6+8+)+7" (this syntax doesn't work in "re", what I want essentially is a pattern like "266868688887", in which you could see there is back and forth movement between 6 and 8. Once you reach 7, the search is done. Has anyone a right idea to express the pattern is "re"? Thanks!

Upvotes: 0

SAMBECK
SAMBECK

Reputation: 36

You can match patterns like this using regular expressions.

https://regexr.com/ is a super helpful tool for experimenting with/learning about regular expressions.

The built in module re does the job:

import re

def make_regex_object(list_of_chars):
    # make a re object out of [a, b... n]: 'a+b+ ... n+''    (note the '+' at end)
    # the '+' means it matches one or more occurrence of each character
    return re.compile('+'.join([str(char) for char in list_of_chars]) + '+')

searcher = make_regex_object(['a', 'b', 'c', 'd'])
searcher.pattern  # 'a+b+c+d+'

x = searcher.search('abczzzzabbbcddefaaabbbccceeabc')   
# caution - search only matches first instance of pattern

print(x)   # <_sre.SRE_Match object; span=(7, 14), match='abbbcdd'>
x.end()    # 14
x.group()  # 'abbbcdd'

You could then repeat this on the remainder of your string if you want to count multiple instances of the pattern. You can count character occurrences with x.group().count(char) or something better.

Upvotes: 1

Related Questions