Barry Ralphs
Barry Ralphs

Reputation: 295

Regex help, match numbers, stop at whitespace

Given the following sample text:

Phase ID: 4 333 stuff
Phase ID: 4.5 333 stuff
Phase ID: 44.55 333 stuff

I'd like to capture just the number after the "Phase ID:". Here is what I currently have:

Phase ID:\K.*([0-9]*\.[0-9]|[0-9][^\s]+)

It works, except it also captures the 333. How can I stop the match after the first number?

Upvotes: 0

Views: 124

Answers (5)

Mister Joe
Mister Joe

Reputation: 19

I would use:

import re 
regex_string = "\d+\.\d+|\d+"
pattern = re.compile(regex_string)
print(pattern.search("Phase ID: 4 333 stuff"))
print(pattern.search("4.5 333 stuff"))
print(pattern.search("Phase ID: 4455 333 stuff"))

Upvotes: 0

The fourth bird
The fourth bird

Reputation: 163467

It works, except it also captures the 333. How can I stop the match after the first number?

The reason you are also matching 333 is because .* is greedy and will match until the end of the line first.

Then it will backtrack until it can match one of the alternatives, and in this case [0-9][^\s]+ can match 33 and the pattern can return a match.

As you are already using \K you can write the pattern as:

Phase ID:\s*\K\d+(?:\.\d+)?

Explanation

  • Phase ID: Match literally
  • \s*\K Match optional whitespace chars and then forget what is matched until now
  • \d+ Match 1+ digits
  • (?:\.\d+)? Match an optional decimal part

See a regex demo

Upvotes: 1

Ryan
Ryan

Reputation: 184

This statement is messing up what you want

[^\s]+

This will match any non-whitespace characters an unlimited number of times, hence why you are seeing those numbers after the space get included. If you change to something like:

Phase ID: ([0-9.]*).*

It will match the entire line, but the digits following the string "Phase ID:" would be captured in the first "capture group" (denoted by parenthesis) which you can extract. The whitespace will not be included in the first capture group because it isnt present in [0-9.]

I like to test regex on interactive sites like https://regex101.com/

Upvotes: 2

Bohemian
Bohemian

Reputation: 425208

Match non-spaces after Phase ID:

(?<=Phase ID: )\S+

See live demo.


If you need to be more prescriptive with the match being a number:

(?<=Phase ID: )\d+(?:\.\d+)?

See live demo.

Upvotes: 2

William Rosenbaum
William Rosenbaum

Reputation: 355

import re
string2 = "Phase ID: 44.55 333 stuff"
captured = float(re.findall(r"Phase ID: (\d*\.*\d*).*", string2)[0])

Upvotes: 1

Related Questions