Nicero
Nicero

Reputation: 4377

Add regex matches to bidimensional array on behalf of their position

I'm trying to match the following pattern and create an array as described below:

Patterns are separated by comma.

Example:

c2,c3-5,c6[2],c8[4]-10,c14-21[5]

These numbers are references to paragraphs of articles of laws and where there is a dash it means that's a range of paragraphs.

So for example:

c3-5 = paragraphs from 3 to 5

With the following regex I can match and separate the numbers:

(\d+(\[\d+\])?-\d+(\[\d+\])?)|(\d+(\[\d+\])?)

https://regex101.com/r/iQ2pQ3/1

But to use effectively these numbers I'm trying to build - without success - an array with the following structure:

Array 
(
    [0] => Array
    (
        [start] => 2
        [end]=> 
    )
    [1] => Array
    (
        [start] => 3
        [end] => 5
    )
    [2] => Array
    (
        [start] => 6[2]
        [end] =>
    )
    [3] => Array
    (
        [start] => 8[4]
        [end] => 10
    )
    [4] => Array
    (
        [start] => 14
        [end] => 21[5]
    )
)

You may see that single matches are added to the array with the key [start], when there's a dash (a range) the first digit is added with the key [start] and the second with the key [end].

The only way I thought I could work it out it to first explode the string by comma and then use a regex on the single exploded strings. Even thought don't know how to build an array as the above one.

Is there a better (more compact and elegant) way to do it?

Upvotes: 1

Views: 27

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627128

Use the following regex based solution (see demo):

$re = '~c(?<start>\d+(?:\[\d+])?)(?:-(?<end>(?&start)?))?~'; 
$str = "c2,c3-5,c6[2],c8[4]-10,c14-21[5]"; 
preg_match_all($re, $str, $matches);
$res = array_map(function($ms, $me) { 
    return array("start" => $ms, "end" => $me);
}, $matches["start"], $matches["end"]);
print_r($res);

The regex is similar to anubhava's but I shortened it with the help of a named subroutine call (that actually recurses, resuses, the start subpattern):

c(?<start>\d+(?:\[\d+])?)(?:-(?<end>(?&start)?))?

See the regex demo, here is its explanation:

  • c - a literal c
  • (?<start>\d+(?:\[\d+])?) - (Group named "start") an obligatory subpattern, \d+ matches 1+ digits that are optionally followed with 1 occurrence of : followed with [ + digits + ]
  • (?:-(?<end>(?&start)?))? - 1 or 0 (optional) sequence of - followed with the "start" group (the value is placed into the "end" group).

Upvotes: 1

anubhava
anubhava

Reputation: 785651

You can modify your regex to this to capture empty matches as well:

c(?P<start>\d+(?:\[\d+\])?)-?(?P<end>\d+(?:\[\d+\])?|)(?=,|$)

RegEx Demo

(?P<end>\d+(?:\[\d+\])?|) ensures we also capture empty matches in end group.

Upvotes: 0

Related Questions