Mohamed Mohsen
Mohamed Mohsen

Reputation: 187

Regex get all possible occurrence in Python

I have a string s = '10000', I need using only the Python re.findall to get how many 0\d0 in the string s For example: for the string s = '10000' it should return 2

explanation: the first occurrence is 10000 while the second occurrence is 10000

I just need how many occurrences and not interested in the occurrence patterns

I've tried the following regex statements:

re.findall(r'(0\d0)', s) #output: ['000']
re.findall(r'(0\d0)*', s) #output: ['', '', '000', '', '', '']

Finally, if I want to make this regex generic to fetch any number then any_number_included_my_number then the_same_number_again, how can I do it?

Upvotes: 0

Views: 169

Answers (1)

ctwheels
ctwheels

Reputation: 22837

How to get all possible occurrences?

The regex

As I mentioned in my comment, you can use the following pattern:

(?=(0\d0))

How it works:

  • (?=...) is a positive lookahead ensuring what follows matches. This doesn't consume characters (allowing us to check for a match at each position in the string as a regex would otherwise resume pattern matching after the consumed characters).
  • (0\d0) is a capture group matching 0, then any digit, then 0

The code

Your code becomes:

See code in use here

re.findall(r'(?=(0\d0))', s)

The result is:

['000', '000']

The python re.findall method states the following

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

This means that our matches are the results of capture group 1 rather than the full match as many would expect.


How to generalize the pattern?

The regex

You can use the following pattern:

(\d)\d\1

How this works:

  • (\d) captures any digit into capture group 1
  • \d matches any digit
  • \1 is a backreference that matches the same text as most recently matched by capture group 1

The code

Your code becomes:

See code in use here

re.findall(r'(?=((\d)\d\2))', s)
print([n[0] for n in x])

Note: The code above has two capture groups, so we need to change the backreference to \2 to match correctly. Since we now have two capture groups, we will get tuples as the documentation states and can use list comprehension to get the expected results.

The result is:

['000', '000']

Upvotes: 2

Related Questions