bos570
bos570

Reputation: 1523

Regex Optionally match a pattern multiple times

I have a string and I want to match a specific pattern optionally as many times as may occur.

My String
0.91 0.45 0.69 58 47 45 23 83 90 $595 NO IDL

After 45 until $595 There could be upto 6 more number there. How can I optionally look for repeating number in that space?

Here's what I have so far:

/([\d.]+) ([\d.]+) ([\d.]+)? (\d+) (\d+) (\d+)  \$(\d+)/ig 

Here are some samples with expected outputs:

0.91 0.45 0.69 58 47 45 23 83 90 $595 NO IDL
output: array([0] => 0.91, 
              [1] => 0.45, 
              [2] => 0.69, 
              [3] => 58, 
              [4] => 47, 
              [5] => 45, 
              [6] => 23, 
              [7] => 83, 
              [8] => 90, 
              [9] => 595)

0.91 0.45 0.69 58 47 45 $595 NO IDL
output: array([0] => 0.91, 
              [1] => 0.45, 
              [2] => 0.69, 
              [3] => 58, 
              [4] => 47, 
              [5] => 45,  
              [5] => 595)

0.91 0.45 0.69 0.63 58 47 45 $595 NO IDL
output: Does not match the pattern because we only want 3 of the first items to contain decimals. 

This seems to split the last number into multiple numbers. Can't figure out whats going on.

I am using php preg_match method for this so would like not empty elements in the resulting array if possible. Thanks.

Upvotes: 1

Views: 882

Answers (3)

The fourth bird
The fourth bird

Reputation: 163237

You might repeat the amount of numbers until you matched 45 which is the 6th number.

Explanation

  • (?:\d+\.\d+)(?: \d+\.\d+){2} Match the number at the start (digit with an decimal part) 3 times
  • (?: \d+){3} Match a digit with a whitespace 3 times. That will match up till 45
  • \s* Match zero or more whitespace characters
  • | Or
  • \G(?!^) Assert the position at the end of the previous match using a negative lookahead to assert not start of the string
  • (\d+)\s Capture the digits and match the whitespace in a capturing group

(?:\d+\.\d+)(?: \d+\.\d+){2}(?: \d+){3}\s*|\G(?!^)(\d+)\s

Regex demo

For example a demo to extract the 3 digits after 45:

Demo

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626748

You may validate the string with a positive lookahead triggered at the start of the string, and then match all numbers from the start up to the currency value once the validation succeeds:

'~(?:\G(?!^)|^(?=\d+\.\d+ \d+\.\d+ \d+(?:\.\d+)?(?: \d+)* \$\d))\s*\$?\K\d+(?:\.\d+)?~'

See the regex demo

Details

  • (?:\G(?!^)|^(?=\d+\.\d+ \d+\.\d+ \d+(?:\.\d+)?(?: \d+)* \$\d)) - either the end of the previous match (\G(?!^)) or start of a string (^) that is followed with
    • \d+\.\d+
    • - a space
    • \d+\.\d+
    • - a space
    • \d+ - 1+ digits
    • (?:\.\d+)? - an optional fractional part
    • (?: \d+)* - 0+ sequences of a space followed with 1+ digits
    • - space
    • \$\d - a $ and a digit.
  • \s* - 0+ whitespaces
  • \$? - an optional $ char
  • \K - match reset operator
  • \d+(?:\.\d+)? - an int/float number (1+ digits followed with an optional sequence of . and 1+ digits).

PHP demo:

$strs = ['0.91 0.45 0.69 58 47 45 23 83 90 $595 NO IDL','0.91 0.45 0.69 58 47 45 $595 NO IDL','0.91 0.45 0.69 0.63 58 47 45 $595 NO IDL'];
$rx = '~(?:\G(?!^)|^(?=\d+\.\d+ \d+\.\d+ \d+(?:\.\d+)?(?: \d+)* \$\d))\s*\$?\K\d+(?:\.\d+)?~';
foreach ($strs as $s) {
    echo "$s:\n";
    if (preg_match_all($rx, $s, $matches)) {
        print_r($matches[0]);
        echo "---------\n";
    } else {
        echo "NO MATCH!!!\n---------\n";
    }

}

Output:

0.91 0.45 0.69 58 47 45 23 83 90 $595 NO IDL:
Array
(
    [0] => 0.91
    [1] => 0.45
    [2] => 0.69
    [3] => 58
    [4] => 47
    [5] => 45
    [6] => 23
    [7] => 83
    [8] => 90
    [9] => 595
)
---------
0.91 0.45 0.69 58 47 45 $595 NO IDL:
Array
(
    [0] => 0.91
    [1] => 0.45
    [2] => 0.69
    [3] => 58
    [4] => 47
    [5] => 45
    [6] => 595
)
---------
0.91 0.45 0.69 0.63 58 47 45 $595 NO IDL:
NO MATCH!!!
---------

Upvotes: 1

Mike J
Mike J

Reputation: 425

This should give you the expected results:

/([\d\$.]+)/ig

Upvotes: 0

Related Questions