b1onic
b1onic

Reputation: 239

How to change a quantifier in a Regex based on a condition?

I would like to find words of length >= 1 which may contain a ' or a - within. Here is a test string:

a quake-prone area- (aujourd'hui-

In Python, I'm currently using this regex:

string = "a quake-prone area- (aujourd'hui-"
RE_WORDS = re.compile(r'[a-z]+[-\']?[a-z]+')
words = RE_WORDS.findall(string)

I would like to get this result:

>>> words
>>> [u'a', u'quake-prone', u'area', u"aujourd'hui"]

but I get this instead:

>>> words
>>> [u'quake-prone', u'area', u"aujourd'hui"]

Unfortunately, because of the last + quantifier, it skips all words of length 1. If I use the * quantifier, it will find a but also area- instead of area.

Then how could create a conditional regex saying: if the word contains an apostrophe or an hyphen, use the + quantifier else use the * quantifier ?

Upvotes: 0

Views: 118

Answers (1)

Avinash Raj
Avinash Raj

Reputation: 174816

I suggest you to change the last [-\']?[a-z]+ part as optional by putting it into a group and then adding a ? quantifier next to that group.

>>> string = "a quake-prone area- (aujourd'hui-"
>>> RE_WORDS = re.compile(r'[a-z]+(?:[-\'][a-z]+)?')
>>> RE_WORDS.findall(string)
['a', 'quake-prone', 'area', "aujourd'hui"]

Reason for why the a is not printed is because of your regex contains two [a-z]+ which asserts that there must be atleast two lowercase letters present in the match.

Note that the regex i mentioned won't match area- because (?:[-\'][a-z]+)? optional group asserts that there must be atleast one lowercase letter would present just after to the - symbol. If no, then stop matching until it reaches the hyphen. So that you got area at the output instead of area- because there isn't an lowercase letter exists next to the -. Here it stops matching until it finds an hyphen without following lowercase letter.

Upvotes: 1

Related Questions