Fred
Fred

Reputation: 381

Matching optional full string with regex in Python

I've gone through the HOWTO and the re module docs several times, and I'm still confused about how optionality and grouping interact in Python regexes. What I want is to match everything inside a group, or not at all, but I'm finding that substrings are matching. Here's a minimal example:

>> re.compile(r"(test)?").search("tes")
<_sre.SRE_MATCH at 0xBlahBlah>

I expected that not to match, since I have the entire string test marked as optional. What (part of the docs) am I not understanding??

A version of the problem that's closer to what I'm actually interested in is as follows:

>> re.compile(r"(distance|mileage)(\sbetween)?").search("distancebetween")
<_sre.SRE_MATCH at 0xBlahblah>

Why is that whitespace not being forced to match?

EDIT 2017-01-04 The answers thus far are helpful, but I think I didn't explain my need sufficiently clearly.

In short, I want a regex that will match foo or bar (in their entirety) or foo baz or bar baz (in there entirety) and nothing else.

>> m = re.compile("(foo|bar)(\sbaz)?")
>> m.search("foo ba")
<_sre.SRE_Match as 0xBlahblah>
>> m.search("foo ba").span()
(0, 3)

So I see that what's happening is that it's matching on foo and then not caring about what's further downstream. How do I get it to match only on baz or nothing at all?

Upvotes: 2

Views: 454

Answers (3)

Guy
Guy

Reputation: 647

With the ? in both cases, you're saying you want either 0 or 1 occurances of the group. So in "(test)?" you either match "test" with doesn't match, or an empty string, which will be the very first part of the string.

In the second one, "(distance|mileage)(\sbetween)?" you have the four matches of "distance", "mileage" or "distance between" or "mileage between".

None of these though have to be the whole string, so there can be test before or after. Otherwise you need ^regex if you only want the start, or regex$ to only match the end, or finally ^regex$ to only match the whole string.

Upvotes: 1

Frank T
Frank T

Reputation: 9066

For what you're describing I don't think you want to use an optional match. I think you want exactly the regexes you have but without the ?.

For your first example:

>>> re.compile(r"(test)").search("tes")
>>> re.compile(r"(test)").search("test")
<_sre.SRE_Match object at 0x104c64210>
>>> re.compile(r"(test)").search("testing")
<_sre.SRE_Match object at 0x104c64198> 

For your second example:

>>> re.compile(r"(distance|mileage)(\sbetween)").search("distancebetween")
>>> re.compile(r"(distance|mileage)(\sbetween)").search("distance between")
<_sre.SRE_Match object at 0x104bf5608>
>>> re.compile(r"(distance|mileage)(\sbetween)").search("distance ")

Upvotes: 1

Ilya V. Schurov
Ilya V. Schurov

Reputation: 8097

Let's look what is matched:

import re
m = re.compile(r"(test)?").search("tes")
m.span()
# have (0, 0)

It's empty string. Why?

Because ? here means zero or one time (just like {0, 1}). So the first group can match either to string test or to empty string (which we have).

Here is a quote from the docs:

'?' Causes the resulting RE to match 0 or 1 repetitions of the preceding RE. ab? will match either ‘a’ or ‘ab’.

Upvotes: 4

Related Questions