Justin Wang
Justin Wang

Reputation: 107

Python Regex Behaviour

I'm trying to parse a text document with data in the following format: 24036 -977. I need to separate the numbers into separate values, and the way I've done that is with the following steps.

values = re.search("(.*?)\s(.*)")
x = values.group(1)
y = values.gropu(2)

This does the job, however I was curious about why using (.*?) in the second group causes the regex to fail? I tested it in the online regex tester(https://regex101.com/r/bM2nK1/1), and adding the ? in causes the second group to return nothing. Now as far as I know .*? means to take any value unlimited times, as few times as possible, and the .* is just the greedy version of that. What I'm confused about is why the non greedy version.*? takes that definition to mean capturing nothing?

Upvotes: 0

Views: 31

Answers (2)

NDevox
NDevox

Reputation: 4086

@iobender has pointed out the answer to your question.

But I think it's worth mentioning that if the numbers are separated by space, you can just use split:

>>> '24036 -977'.split()
['24036', '-977']

This is simpler, easier to understand and often faster than regex.

Upvotes: 1

iobender
iobender

Reputation: 3486

Because it means to match the previous token, the *, as few times as possible, which is 0 times. If you would it to extend to the end of the string, add a $, which matches the end of string. If you would like it to match at least one, use + instead of *.

The reason the first group .*? matches 24036 is because you have the \s token after it, so the fewest amount of characters the .*? could match and be followed by a \s is 24036.

Upvotes: 3

Related Questions