sakeesh
sakeesh

Reputation: 1049

python regex:Consider end of line as an OR condition in a search, similar to characters in character class

Problem: Find all vowels (more than 2) that are sandwiched between two consonants. These vowels can come at beginning or end of line. Example:-

input :-

abaabaabaabaae

expected output :-

['aa','aa','aa','aae']

solution Tried

import re
pattern=re.compile(r'(?:[^aeiouAEIOU])([AEIOUaeiou]{2,})(?=[^AEIOUaeiou])')
pattern.findall("abaabaabaabaae")

This gives output as ['aa','aa','aa'] , it ignores 'aae' for obvious reason as end of line is not part of search criteria. How can I include an anchor - end of line ($) inclusive search such that it($) is an OR condition in the search and not an mandatory end of line.

Upvotes: 1

Views: 442

Answers (2)

Cary Swoveland
Cary Swoveland

Reputation: 110745

You can extract matches of the regular expression

re'(?<=[b-df-hj-np-tv-z])[aeiou]{2,}(?=[b-df-hj-np-tv-z]|$)'

Demo

For the following string the matches are indicated.

_abaab_aabaabaaeraaa_babaa%abaa
   ^^     ^^ ^^^             ^^

I found it easiest to explicitly match consonants with the character class

[b-df-hj-np-tv-z]

Python demo

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522646

I would use re.findall with the pattern (?<=[^\Waeiou])[aeiou]+(?![aeiou]):

inp = "abaabaabaabaae"
matches = re.findall(r'(?<=[^\Waeiou])[aeiou]+(?![aeiou])', inp, flags=re.IGNORECASE)
print(matches)

This prints:

['aa', 'aa', 'aa', 'aae']

Here is an explanation of the regex pattern:

(?<=[^\Waeiou])  assert that what precedes is any word character, excluding a vowel
                 this also exlcudes the start of the input
[aeiou]+         match one or more vowel characters
(?![aeiou])      assert that what follows is not a vowel (includes end of string)

Upvotes: 0

Related Questions