user3362815
user3362815

Reputation: 3

Capture two numbers in a string with PHP preg_match

Here is a very simple piece of code:

<?php
 $matches = array();
 $str = "You came in 21st out of 53";
 preg_match("/(\d+).*(\d+)/",$str,$matches);
 echo var_dump($matches);
 ?>

I am learning and experimenting with PHP's preg_match and regular expressions. I thought the code above would capture "21" and "53", but what it actually captures is "21" and "3".

Here is the result of echo var_dump($matches);

array(3) { 
          [0]=> string(14) "21st out of 53" 
          [1]=> string(2) "21" 
          [2]=> string(1) "3" 
         } 
  1. How do I write the regular expression to capture "53"? I hope the answer can be general enough that it could also capture "153" and "jkj53hjjk" (so that the expression is not changed to "/(\d+).*(\b\d+)/" .

  2. If discussion is possible, why is it that when capturing the first number, it does so greedily, but when capturing the second number, it is not greedy? Is it capturing the number backwards and is therefore happy to stop at the first digit it finds? Can this be overcome?

This is my first post to Stack Overflow. I researched this question quite a bit, but I could not find an answer.

Upvotes: 0

Views: 2298

Answers (4)

mickmackusa
mickmackusa

Reputation: 47864

sscanf() offers a few unique advantages over preg_match():

  1. This is no fullstring match bloating the output array.
  2. It can return the isolated values as an indexed array or assign them to declared reference variables.
  3. Numeric substrings can be explicitly cast as non-string types.

In the format parameter below, there are two unique subpatterns:

  • %*[^0-9] which means greedily match one or more non-digital characters, then ignore the matched substring because of the * (assignment suppressor), and
  • %d which means greedily match one or more digital characters

Code: (Demo)

$str = "You came in 21st out of 53";

sscanf($str, '%*[^0-9]%d%*[^0-9]%d', $place, $total);

var_export(compact(['place', 'total']));

Or

var_export(sscanf($str, '%*[^0-9]%d%*[^0-9]%d'));

Outputs:

array (
  'place' => 21,
  'total' => 53,
)
---
array (
  0 => 21,
  1 => 53,
)

Upvotes: 0

enterx
enterx

Reputation: 879

/(\d+)[^0-9]+(\d+)/

add without any of range to match 53 too

/(\d+)[^0-9]+ (.*\d+.*)/

Add space and surround \d with .*to match sfgdfg53gfdg

Upvotes: 0

Marc B
Marc B

Reputation: 360582

That's normal, by documented design: http://php.net/preg_match :

If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.

If you don't want the full match string, then just array_shift() it off the $matches array.

Upvotes: 0

CAustin
CAustin

Reputation: 4614

The problem is that your .* is greedy and grabbing as many characters as it can before the regex needs to match the second \d+ in order to resolve. Adding a ? should allow it to work as intended:

(\d+).*?(\d+)

Upvotes: 1

Related Questions