Reputation: 3
Here is a very simple piece of code:
<?php
$matches = array();
$str = "You came in 21st out of 53";
preg_match("/(\d+).*(\d+)/",$str,$matches);
echo var_dump($matches);
?>
I am learning and experimenting with PHP's preg_match and regular expressions. I thought the code above would capture "21" and "53", but what it actually captures is "21" and "3".
Here is the result of echo var_dump($matches);
array(3) {
[0]=> string(14) "21st out of 53"
[1]=> string(2) "21"
[2]=> string(1) "3"
}
How do I write the regular expression to capture "53"? I hope the answer can be general enough that it could also capture "153" and "jkj53hjjk" (so that the expression is not changed to "/(\d+).*(\b\d+)/" .
If discussion is possible, why is it that when capturing the first number, it does so greedily, but when capturing the second number, it is not greedy? Is it capturing the number backwards and is therefore happy to stop at the first digit it finds? Can this be overcome?
This is my first post to Stack Overflow. I researched this question quite a bit, but I could not find an answer.
Upvotes: 0
Views: 2298
Reputation: 47864
sscanf()
offers a few unique advantages over preg_match()
:
In the format parameter below, there are two unique subpatterns:
%*[^0-9]
which means greedily match one or more non-digital characters, then ignore the matched substring because of the *
(assignment suppressor), and%d
which means greedily match one or more digital charactersCode: (Demo)
$str = "You came in 21st out of 53";
sscanf($str, '%*[^0-9]%d%*[^0-9]%d', $place, $total);
var_export(compact(['place', 'total']));
Or
var_export(sscanf($str, '%*[^0-9]%d%*[^0-9]%d'));
Outputs:
array (
'place' => 21,
'total' => 53,
)
---
array (
0 => 21,
1 => 53,
)
Upvotes: 0
Reputation: 879
/(\d+)[^0-9]+(\d+)/
add without any of range to match 53 too
/(\d+)[^0-9]+ (.*\d+.*)/
Add space and surround \d with .*
to match sfgdfg53gfdg
Upvotes: 0
Reputation: 360582
That's normal, by documented design: http://php.net/preg_match :
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.
If you don't want the full match string, then just array_shift()
it off the $matches
array.
Upvotes: 0
Reputation: 4614
The problem is that your .* is greedy and grabbing as many characters as it can before the regex needs to match the second \d+ in order to resolve. Adding a ? should allow it to work as intended:
(\d+).*?(\d+)
Upvotes: 1