smac89
smac89

Reputation: 43098

Python regex to match character a number of times

I am trying to create a regex to match a character a specified number of times in a string. The character does not have to occur right after it's last occurrence, so the regex has to match the character any where it occurs for the number of times it does while that number does not exceed the one given.

Also this regex has to be overlapping this means that it has to find all substrings containing the specified character the amount of times it was specified; and it has to do this as many times as possible within the string.

Here is my attempt, this one just brute-forces it's way and finds almost every possible string that contains that character:

import re
c = raw_input()
a = re.compile(r'(?=(.*{0}.*?))(?=(.*{1}.*))(?=(.*?{2}.*))'.format(c, c, c))
print [ s for s in a.findall(raw_input()) ]

This works in that it tries to find all of them, but it sometimes does not find everything:

python
Python 2.7.1 (r271:86832, Jun 16 2011, 16:59:05) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import string_regex
1
10101
[('10101', '10101', '10101'), ('0101', '0101', '0101'), ('101', '101', '101'), ('01', '01', '01'), ('1', '1', '1')]

It does not find the string '10' which it is supposed to find 2 times I need help to make the regex match just what I want not everything

Upvotes: 5

Views: 20800

Answers (2)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

try with this kind of pattern (for 10 between 0 or 2 times):

^(([^1]+|1+(?!0))*10){0,2}([^1]+|1+(?!0))*$

You can easily adapt it for rabbit between 0 and 3 times:

^(([^r]+|r+(?!abbit))*rabbit){0,3}([^r]+|r+(?!abbit))*$

Upvotes: 7

Janne Karila
Janne Karila

Reputation: 25197

Here's a list comprehension that finds all substrings containing character 1 two times assuming the string consists of characters 0 and 1. To allow any characters, substitute [^1] for each 0.

[prefix + suffix[:n] 
    for prefix, suffix in re.findall(r'(?=((?:0*1){2})(0*))', '010100110')
        for n in xrange(len(suffix) + 1)]

Output:

['0101', '01010', '010100', '101', '1010', '10100', '01001', '1001', 
 '0011', '00110', '011', '0110', '11', '110']

Using a capturing group inside lookahead makes findall give overlapping matches, but each match still begins at a different position. Here I'm using string slicing to produce the different substrings that start from the same position.

Upvotes: -1

Related Questions