Reputation: 4377
I'm trying to match the following pattern and create an array as described below:
letter 'c' followed by digit
letter 'c' followed by digit dash digit
digit may be followed by an other digit enclosed in square parentheses []
Patterns are separated by comma.
Example:
c2,c3-5,c6[2],c8[4]-10,c14-21[5]
These numbers are references to paragraphs of articles of laws and where there is a dash it means that's a range of paragraphs.
So for example:
c3-5 = paragraphs from 3 to 5
With the following regex I can match and separate the numbers:
(\d+(\[\d+\])?-\d+(\[\d+\])?)|(\d+(\[\d+\])?)
https://regex101.com/r/iQ2pQ3/1
But to use effectively these numbers I'm trying to build - without success - an array with the following structure:
Array
(
[0] => Array
(
[start] => 2
[end]=>
)
[1] => Array
(
[start] => 3
[end] => 5
)
[2] => Array
(
[start] => 6[2]
[end] =>
)
[3] => Array
(
[start] => 8[4]
[end] => 10
)
[4] => Array
(
[start] => 14
[end] => 21[5]
)
)
You may see that single matches are added to the array with the key [start]
, when there's a dash (a range) the first digit is added with the key [start]
and the second with the key [end]
.
The only way I thought I could work it out it to first explode
the string by comma and then use a regex on the single exploded strings. Even thought don't know how to build an array as the above one.
Is there a better (more compact and elegant) way to do it?
Upvotes: 1
Views: 27
Reputation: 627128
Use the following regex based solution (see demo):
$re = '~c(?<start>\d+(?:\[\d+])?)(?:-(?<end>(?&start)?))?~';
$str = "c2,c3-5,c6[2],c8[4]-10,c14-21[5]";
preg_match_all($re, $str, $matches);
$res = array_map(function($ms, $me) {
return array("start" => $ms, "end" => $me);
}, $matches["start"], $matches["end"]);
print_r($res);
The regex is similar to anubhava's but I shortened it with the help of a named subroutine call (that actually recurses, resuses, the start
subpattern):
c(?<start>\d+(?:\[\d+])?)(?:-(?<end>(?&start)?))?
See the regex demo, here is its explanation:
c
- a literal c
(?<start>\d+(?:\[\d+])?)
- (Group named "start") an obligatory subpattern, \d+
matches 1+ digits that are optionally followed with 1 occurrence of :
followed with [
+ digits + ]
(?:-(?<end>(?&start)?))?
- 1 or 0 (optional) sequence of -
followed with the "start" group (the value is placed into the "end" group).Upvotes: 1
Reputation: 785651
You can modify your regex to this to capture empty matches as well:
c(?P<start>\d+(?:\[\d+\])?)-?(?P<end>\d+(?:\[\d+\])?|)(?=,|$)
(?P<end>\d+(?:\[\d+\])?|)
ensures we also capture empty matches in end
group.
Upvotes: 0