Extract multiple numbers from a string after a specific substring

I have the following string:

H: 290​‐​314 P: 280​‐​301+330​​​​U+200B+331​string‐​305+351+338​‐​308+310 [2]

I need all the numbers after P:: [280,301,330,331,305,351,338,308,310].

Note that there is this U+200B which is a char-code and should be ignored.

I tried #P:\s((\d+)[​\‐]+)+# but that doesn't work.

Upvotes: 1

Views: 197

Answers (2)

mickmackusa
mickmackusa

Reputation: 47991

I'd use the continue operator this way: (Demo)

$str = 'H: 290‐314 P: 280‐301+330U+200B+331string‐305+351+338‐308+310 [2]';
preg_match_all('~(?:P: |\G(?!^)(?:U\+200B)?[^\d ]+)\K\d+~', $str, $m);
var_export($m[0]);

Start from P: then match consecutive digits. Consume non-digit, non-spaces, and your blacklisted string as delimiters. Forget unwanted substrings with \K.

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627093

You can use

(?:\G(?!\A)(?:[^\d\s]*200B)?|P:\h*)[^\d\s]*\K(?!200B)\d+

See the regex demo.

Details:

  • (?:\G(?!\A)(?:[^\d\s]*200B)?|P:\h*) - either the end of the previous successful match and then any zero or more chars other than digits/whitespace and 200B, or P: and zero or more horizontal whitespaces
  • [^\d\s]* - zero or more chars other than digits and whitespace
  • \K - match reset operator that discards the text matched so far from the overall match memory buffer
  • (?!200B)\d+ - one or more digits that are not starting the 200B char sequence.

See the PHP demo:

$text = 'H: 290‐314 P: 280‐301+330U+200B+331string‐305+351+338‐308+310 [2]';
if (preg_match_all('~(?:\G(?!\A)(?:[^\d\s]*200B)?|P:\h*)[^\d\s]*\K(?!200B)\d+~', $text, $matches)) {
    print_r($matches[0]);
}

Output:

Array
(
    [0] => 280
    [1] => 301
    [2] => 330
    [3] => 331
    [4] => 305
    [5] => 351
    [6] => 338
    [7] => 308
    [8] => 310
)

Upvotes: 0

Related Questions