Add regex matches to bidimensional array on behalf of their position

Question

I'm trying to match the following pattern and create an array as described below:

letter 'c' followed by digit
letter 'c' followed by digit dash digit
digit may be followed by an other digit enclosed in square parentheses []

Patterns are separated by comma.

Example:

c2,c3-5,c6[2],c8[4]-10,c14-21[5]

These numbers are references to paragraphs of articles of laws and where there is a dash it means that's a range of paragraphs.

So for example:

c3-5 = paragraphs from 3 to 5

With the following regex I can match and separate the numbers:

(\d+($$\d+$$)?-\d+($$\d+$$)?)|(\d+($$\d+$$)?)

https://regex101.com/r/iQ2pQ3/1

But to use effectively these numbers I'm trying to build - without success - an array with the following structure:

Array 
(
    [0] => Array
    (
        [start] => 2
        [end]=> 
    )
    [1] => Array
    (
        [start] => 3
        [end] => 5
    )
    [2] => Array
    (
        [start] => 6[2]
        [end] =>
    )
    [3] => Array
    (
        [start] => 8[4]
        [end] => 10
    )
    [4] => Array
    (
        [start] => 14
        [end] => 21[5]
    )
)

You may see that single matches are added to the array with the key [start], when there's a dash (a range) the first digit is added with the key [start] and the second with the key [end].

The only way I thought I could work it out it to first explode the string by comma and then use a regex on the single exploded strings. Even thought don't know how to build an array as the above one.

Is there a better (more compact and elegant) way to do it?

Wiktor Stribiżew · Accepted Answer

Use the following regex based solution (see demo):

$re = '~c(?\d+(?:\[\d+])?)(?:-(?(?&start)?))?~'; 
$str = "c2,c3-5,c6[2],c8[4]-10,c14-21[5]"; 
preg_match_all($re, $str, $matches);
$res = array_map(function($ms, $me) { 
    return array("start" => $ms, "end" => $me);
}, $matches["start"], $matches["end"]);
print_r($res);

The regex is similar to anubhava's but I shortened it with the help of a named subroutine call (that actually recurses, resuses, the start subpattern):

c(?\d+(?:\[\d+])?)(?:-(?(?&start)?))?

See the regex demo, here is its explanation:

c - a literal c
(?\d+(?:\[\d+])?) - (Group named "start") an obligatory subpattern, \d+ matches 1+ digits that are optionally followed with 1 occurrence of : followed with [ + digits + ]
(?:-(?(?&start)?))? - 1 or 0 (optional) sequence of - followed with the "start" group (the value is placed into the "end" group).

Add regex matches to bidimensional array on behalf of their position

Answers (2)

Related Questions