Logitope
Logitope

Reputation: 25

Regex (PCRE): Match all digits conditional upon presence of a string

Using PCRE, I want to capture only and all digits in a single line, but only if a certain string (say "STRING99") is present anywhere within that line.

For example, consider these two cases:

a1 STRING99 2b c3d

a1 2b c3d

In the first case, I want the result to be "19923". In the second case, I want an empty result.

I am not sure this is possible. It might work with a variable-length lookbehind, but this is not supported in PCRE. Furthermore, something like (?=.*STRING99.*$)(\D|(\d))* WOULD work, but "a repeated capturing group will only capture the last iteration", meaning the second capturing group only captures the last digit. I am unable to find a workaround for this.

(This is obviously not hard to achieve with 2 consecutive Regex operations, but I want it in one formula.)

Upvotes: 2

Views: 211

Answers (3)

anubhava
anubhava

Reputation: 785306

You may use this PCRE regex in preg_replace:

^(?!.*STRING99).*(*SKIP)|\D+

RegEx Demo

RegEx Details:

  • ^: Start
  • (?!.*STRING99): Lokahead to check if we have STRING99 anywhere in input
  • .*(*SKIP): Match rest of the input till end and skip it
  • |: OR
  • \D+: Match 1+ non-digit

PHP Code:

$repl = preg_replace('~^(?!.*STRING99).*(*SKIP)|\D+~', '', $str);

Upvotes: 4

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626952

You can use

(?:\G(?!^)|^(?=.*STRING99)|^.*(*ACCEPT))\d*\K\D+

See the regex demo (where I replaced \D with [^\d\n] for demo purposes only since the test is performed against a multiline string).

Details

  • (?:\G(?!^)|^(?=.*STRING99)|^.*(*ACCEPT)) - either the end of the previous successful match (\G(?!^)) or (|) the string start position if after any 0+ chars there is a STRING99 string (see ^(?=.*STRING99)) or (else) the whole string is matched with ^.* and the successful match is returned (the pattern is not parsed further)
  • \d* - 0 or more digits are consumed
  • \K - and discarded from the match memory buffer
  • \D+ - 1 or more non-digit chars (that will be removed eventually).

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521629

We could use a regex replacement with a callback function:

$output = preg_replace_callback('/^.*\bSTRING99\b.*$/', function ($match) {
    return preg_replace("/\D+/", "", $match[0]);
}, 'a1 STRING99 2b c3d');

echo $output;   // prints 1992

The approach here is to first to match only inputs containing STRING 99. Then on such matched we strip off all non digit characters. Non matches would not have this replacement done.

Upvotes: 1

Related Questions