Reputation: 107
I'm trying to parse a text document with data in the following format: 24036 -977
. I need to separate the numbers into separate values, and the way I've done that is with the following steps.
values = re.search("(.*?)\s(.*)")
x = values.group(1)
y = values.gropu(2)
This does the job, however I was curious about why using (.*?) in the second group causes the regex to fail? I tested it in the online regex tester(https://regex101.com/r/bM2nK1/1), and adding the ? in causes the second group to return nothing. Now as far as I know .*?
means to take any value unlimited times, as few times as possible, and the .*
is just the greedy version of that. What I'm confused about is why the non greedy version.*?
takes that definition to mean capturing nothing?
Upvotes: 0
Views: 31
Reputation: 4086
@iobender has pointed out the answer to your question.
But I think it's worth mentioning that if the numbers are separated by space, you can just use split
:
>>> '24036 -977'.split()
['24036', '-977']
This is simpler, easier to understand and often faster than regex
.
Upvotes: 1
Reputation: 3486
Because it means to match the previous token, the *
, as few times as possible, which is 0 times. If you would it to extend to the end of the string, add a $
, which matches the end of string. If you would like it to match at least one, use +
instead of *
.
The reason the first group .*?
matches 24036
is because you have the \s
token after it, so the fewest amount of characters the .*?
could match and be followed by a \s
is 24036
.
Upvotes: 3