Reputation: 8061
I'm new to python, coming from a basic knowledge of perl. I'm trying to capture a substring with regex.
>>> a='Question 73 of 2943'
>>> import re
>>> re.match("Question.*(\d+)\s+of", a).group(0)
'Question 73 of'
>>> re.match("Question.*(\d+)\s+of", a).group(1)
'3'
What I wanted to do was to catch 73 in the group. I assumed that the parenthesis would do that.
Upvotes: 1
Views: 94
Reputation: 615
.*
is greedy. What this means is it will continue to match any character (except for line terminators) 0 or more times. That means the (\d+)
capture group you have set up will never happen. What you can do is make the .*
part lazy by adding a ?
so your regex would look like...
re.match(r"Question.*?(\d+)\s+of", a)
The difference between lazy and greedy regex is well explained here
Upvotes: 1
Reputation: 856
Your .* part will capture any character included a digit. Better to use except
.
Question[^\d]*(\d+)\s+of
that should give you 73
Upvotes: 0
Reputation: 789
If you would like to capture 73
only, you can do
re.search(r'\d+', a).group()
which stops searching for a match after finding the first match.
Upvotes: 0