Re-evaluate a Character in Python Regex

Question

For the below regex in python.It is giving output as 2.But,output should be 4.I want to find the number of occurences of vowel which has consonant before and after.But.It is skipping the next consonant if it has a vowel there.

Example: In 'lolololol'.
From the index (0,2) my condition is satisified. Then it is moving to index 3.But,I want once again regex to check from preceding index value that is from 2.How it is done is python Regex.Below is my code:

p = re.findall('[b-df-hj-np-tv-z][aeiou][b-df-hj-np-tv-z]','lolololol',re.IGNORECASE)
print(len(p))

Wiktor Stribiżew · Accepted Answer

You should understand first what your regex is doing.

It matches the first l with [b-df-hj-np-tv-z], then a vowel o with [aeiou], and then the following l with [b-df-hj-np-tv-z]. The match is found and returned. The index is at the second o. This o cannot be matched with [b-df-hj-np-tv-z], thus, the match is failed, the index is moved on to the next l. A match is found: lol. Then again o cannot be matched, and then lo is not matched as there is no final third character there.

You only need to use a look-ahead (?=[b-df-hj-np-tv-z]) instead of a [b-df-hj-np-tv-z] so that the character is only checked and not consumed:

import re
p = re.compile(r'[b-df-hj-np-tv-z][aeiou](?=[b-df-hj-np-tv-z])') 
#                                        ^^^                 ^ 
test_str = "lolololol"
print(p.findall(test_str))
print(len(p.findall(test_str)))

See IDEONE demo

That way, the trailing "syllable" boundary is checked, but not consumed and is available to be tested during the next regex iteration.

A must-read article about how Lookarounds Stand their Ground at rexegg.com.

Re-evaluate a Character in Python Regex

Answers (2)

Related Questions