Reputation: 3072
I'm trying to match all fractions or 'evs' and strings (string1, string2) the following string with regex. The strings may contain any number of white spaces ('String 1', 'The String 1', 'The String Number 1').
10/3 string1 evs string2 8/5 mon 19:45 string1 v string2 1/1 string1 v string2 1/1
The following regex works in Javascript but not in PHP. No errors are returned, just 0 results.
(\d{1,3}\/\d{1,3}|evs).*?(.+).*?(\d{1,3}\/\d{1,3}|evs).*?(.+).*?(\d{1,3}\/\d{1,3}|evs).*?(.+) v (.+).*?(\d{1,3}\/\d{1,3}|evs).*?(.+) v (.+).*?(\d{1,3}\/\d{1,3}|evs)
Here's the expected result, other than group 6 and 7 (ran using Javascript):
If I add a ?
to the first (.+)
so that it becomes (.+?)
, I get the desired result but with the first string not captured:
As soon as I remove the ?
to capture the whole string, there are no results returned. Can somebody work out what's going on here?
Upvotes: 0
Views: 95
Reputation: 626853
In PCRE/PHP, you may use
$regex = '(\d{1,3}\/\d{1,3}|evs)\s+(\S+)\s+((?1))\s+(\S+)\s+((?1))\s+(.+?)\s+v\s+(\S+)\s+((?1))\s+(\S+)\s+v\s+(\S+)\s+((?1))';
if (preg_match_all($regex, $text, $matches)) {
print_r($matches[0]);
}
See the regex demo
The point is that you can't over-use .*?
/ .+
in the middle of the pattern, that leads to catastrophic backtracking.
You need to use precise patterns to match whitespace, and non-whitespace fields, and only use .*?
/ .+?
where the fields can contain any amount of whitespace and non-whitespace chars.
Details
(\d{1,3}\/\d{1,3}|evs)
- Group 1 (its pattern can be later accessed using (?1)
subroutine): one to three digits, /
and then one to three digits, or evs
\s+(\S+)\s+
- 1+ whitespaces, Group 2 matching 1+ non-whitespace chars, 1+ whitespaces((?1))
- Group 3 that matches the same way Group 1 pattern does\s+(\S+)\s+((?1))\s+
- 1+ whitespaces, Group 4 matching 1+ non-whitespaces, 1+ whitespaces, Group 5 with the Group 1 pattern, 1+ whitespaces(.+?)
- Group 6: matching any 1 or more char chars other than line break chars as few as possible\s+v\s+
- v
enclosed with 1+ whitespaces(\S+)
- Group 7: 1+ non-whitespaces\s+((?1))\s+
- 1+ whitespaces, Group 8 with Group 1 pattern, 1+ whitespaces(\S+)
- Group 9: 1+ non-whitespaces\s+v\s+
- v
enclosed with 1+ whitespaces(\S+)\s+((?1))
- Group 10: 1+ non-whitespaces, then 1+ whitespaces and Group 11 with Group 1 pattern.Upvotes: 1