Hashken
Hashken

Reputation: 4676

Python regex: How to specify an optional match (for potentially empty sub expression)?

I need to match the following sets of input:

foo_abc_bar  
foo_bar

and get "abc" or an empty string as the result.

So this is the regular expression I wrote:

r'foo_(abc|)[_|]bar'

But for some reason, this does not match with the second string that I have given.

On further inspection, I found that [_|] does not match an empty string.

So, how do I solve this problem?

Upvotes: 1

Views: 3468

Answers (2)

NPE
NPE

Reputation: 500683

To make abc_ optional, you could use the question mark operator:

(abc_)?

Thus, the entire regex becomes:

r'foo_(abc_)?bar'

With this regex, the second underscore (if present) will become part of the capture group. If you don't want that, you could either remove it post-match with .rstrip('_') or use a slightly more complex regex:

r'foo_(?:(abc)_)?bar'

I found that [_|] does not match an empty string.

That's right. Square brackets denote a character group. The [_|] would match exactly one underscore or exactly one vertical bar, and nothing else. In other words, the vertical bar loses its special meaning when it appears inside a character group.

Upvotes: 5

John Woo
John Woo

Reputation: 263803

if you want a string pattern like this

xxx_xxx_xxx
xxx_xxx

then you need

([A-Za-z]{3})((_[A-Za-z]{3})+)?

but this will work also

r'foo(_abc)?_bar'

? means optional (may or may not match).

Upvotes: 1

Related Questions