Reputation: 295
Given the following sample text:
Phase ID: 4 333 stuff
Phase ID: 4.5 333 stuff
Phase ID: 44.55 333 stuff
I'd like to capture just the number after the "Phase ID:". Here is what I currently have:
Phase ID:\K.*([0-9]*\.[0-9]|[0-9][^\s]+)
It works, except it also captures the 333. How can I stop the match after the first number?
Upvotes: 0
Views: 124
Reputation: 19
I would use:
import re
regex_string = "\d+\.\d+|\d+"
pattern = re.compile(regex_string)
print(pattern.search("Phase ID: 4 333 stuff"))
print(pattern.search("4.5 333 stuff"))
print(pattern.search("Phase ID: 4455 333 stuff"))
Upvotes: 0
Reputation: 163467
It works, except it also captures the 333. How can I stop the match after the first number?
The reason you are also matching 333 is because .*
is greedy and will match until the end of the line first.
Then it will backtrack until it can match one of the alternatives, and in this case [0-9][^\s]+
can match 33 and the pattern can return a match.
As you are already using \K
you can write the pattern as:
Phase ID:\s*\K\d+(?:\.\d+)?
Explanation
Phase ID:
Match literally\s*\K
Match optional whitespace chars and then forget what is matched until now\d+
Match 1+ digits(?:\.\d+)?
Match an optional decimal partSee a regex demo
Upvotes: 1
Reputation: 184
This statement is messing up what you want
[^\s]+
This will match any non-whitespace characters an unlimited number of times, hence why you are seeing those numbers after the space get included. If you change to something like:
Phase ID: ([0-9.]*).*
It will match the entire line, but the digits following the string "Phase ID:" would be captured in the first "capture group" (denoted by parenthesis) which you can extract. The whitespace will not be included in the first capture group because it isnt present in [0-9.]
I like to test regex on interactive sites like https://regex101.com/
Upvotes: 2
Reputation: 425208
Match non-spaces after Phase ID:
(?<=Phase ID: )\S+
See live demo.
If you need to be more prescriptive with the match being a number:
(?<=Phase ID: )\d+(?:\.\d+)?
See live demo.
Upvotes: 2
Reputation: 355
import re
string2 = "Phase ID: 44.55 333 stuff"
captured = float(re.findall(r"Phase ID: (\d*\.*\d*).*", string2)[0])
Upvotes: 1