ashr81
ashr81

Reputation: 650

Re-evaluate a Character in Python Regex

For the below regex in python.It is giving output as 2.But,output should be 4.I want to find the number of occurences of vowel which has consonant before and after.But.It is skipping the next consonant if it has a vowel there.

Example: In 'lolololol'.
From the index (0,2) my condition is satisified. Then it is moving to index 3.But,I want once again regex to check from preceding index value that is from 2.How it is done is python Regex.Below is my code:

p = re.findall('[b-df-hj-np-tv-z][aeiou][b-df-hj-np-tv-z]','lolololol',re.IGNORECASE)
print(len(p))

Upvotes: 2

Views: 115

Answers (2)

vks
vks

Reputation: 67978

p = re.findall('(?<=[b-df-hj-np-tv-z])[aeiou](?=[b-df-hj-np-tv-z])','lolololol',re.IGNORECASE)
print(len(p))

Use lookaheads in case matches overlap, as otherwise the characters you have already matched will not be available for the following match attempt. See demo.

https://regex101.com/r/lR1eC9/14

It has 4 matches.

Upvotes: 3

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

You should understand first what your regex is doing.

It matches the first l with [b-df-hj-np-tv-z], then a vowel o with [aeiou], and then the following l with [b-df-hj-np-tv-z]. The match is found and returned. The index is at the second o. This o cannot be matched with [b-df-hj-np-tv-z], thus, the match is failed, the index is moved on to the next l. A match is found: lol. Then again o cannot be matched, and then lo is not matched as there is no final third character there.

You only need to use a look-ahead (?=[b-df-hj-np-tv-z]) instead of a [b-df-hj-np-tv-z] so that the character is only checked and not consumed:

import re
p = re.compile(r'[b-df-hj-np-tv-z][aeiou](?=[b-df-hj-np-tv-z])') 
#                                        ^^^                 ^ 
test_str = "lolololol"
print(p.findall(test_str))
print(len(p.findall(test_str)))

See IDEONE demo

That way, the trailing "syllable" boundary is checked, but not consumed and is available to be tested during the next regex iteration.

A must-read article about how Lookarounds Stand their Ground at rexegg.com.

Upvotes: 3

Related Questions