bambamboole
bambamboole

Reputation: 587

Convert string with no delimiters into associative multidimensional array

I need to parse a string that has no delimiting character to form an associative array.

Here is an example string:

*01the title*35the author*A7other useless infos*AEother useful infos*AEsome delimiters can be there multiple times

Every "key" (which precedes its "value") is comprised of an asterisk (*) followed by two alphanumeric characters. I use this regex pattern: /\*[A-Z0-9]{2}/

This is my preg_split() call:

$attributes = preg_split('/\*[A-Z0-9]{2}/', $line);

This works to isolate the "value", but I also need to extract the "key" to form my desired associative array.

What I get looks like this:

$matches = [
    0 => 'the title',
    1 => 'the author',
    2 => 'other useless infos',
    3 => 'other useful infos',
    4 => 'some delimiters can be there multiple times'
];

My desired output is:

$matches = [
    '*01' => 'the title',
    '*35' => 'the author',
    '*A7' => 'other useless infos',
    '*AE' => [
        'other useful infos',
        'some delimiters can be there multiple times',
    ],
];

Upvotes: 2

Views: 1123

Answers (4)

mickmackusa
mickmackusa

Reputation: 47894

Here is a functional-style approach that doesn't require duplicate-keyed values to be consecutively written in the input string.

  • Use preg_match_all() to isolate the two components of each subexpression in the input string.
  • Use array_map() to replace each row of indexed match values with a single, associative element.
  • Use the spread operator (...) to unpack the newly modified matches array as indvidual associative arrays and feed that to array_merge_recursive(). The native behavior of array_merge_recursive() is to only create subarray structures where necessary.

Code: (Demo)

$str = '*01the title*35the author*A7other useless infos*AEother useful infos*AEsome delimiters can be there multiple times';

var_export(
    array_merge_recursive(
        ...array_map(
            fn($row) => [$row[1] => $row[2]],
            preg_match_all(
                '/(\*[A-Z\d]{2})(.+?)(?=$|\*[A-Z\d]{2})/',
                $str,
                $m,
                PREG_SET_ORDER
            ) ? $m : []
        )
    )
);

Output:

array (
  '*01' => 'the title',
  '*35' => 'the author',
  '*A7' => 'other useless infos',
  '*AE' => 
  array (
    0 => 'other useful infos',
    1 => 'some delimiters can be there multiple times',
  ),
)

Upvotes: 0

Markus Ankenbrand
Markus Ankenbrand

Reputation: 523

Use the PREG_SPLIT_DELIM_CAPTURE flag of the preg_split function to also get the captured delimiter (see documentation).

So in your case:

# The -1 is the limit parameter (no limit)
$attributes = preg_split('/(\*[A-Z0-9]{2})/', $line, -1, PREG_SPLIT_DELIM_CAPTURE);

Now you have element 0 of $attributes as everything before the first delimiter and then alternating the captured delimiter and the next group so you can build your $matches array like this (assuming that you do not want to keep the first group):

for($i=1; $i<sizeof($attributes)-1; $i+=2){
    $matches[$attributes[$i]] = $attributes[$i+1];
}

In order to account for delimiters being present multiple times you can adjust the line inside the for loop to check whether this key already exists and in that case create an array.

Edit: a possibility to create an array if necessary is to use this code:

for($i=1; $i<sizeof($attributes)-1; $i+=2){
    $key = $attributes[$i];
    if(array_key_exists($key, $matches)){
        if(!is_array($matches[$key]){
            $matches[$key] = [$matches[$key]];
        }
        array_push($matches[$key], $attributes[$i+1]);
    } else {
        $matches[$attributes[$i]] = $attributes[$i+1];
    }
}

The downstream code can certainly be simplified, especially if you put all values in (possibly single element) arrays.

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626794

You may match and capture the keys into Group 1 and all the text before the next delimiter into Group 2 where the delimiter is not the same as the first one captured. Then, in a loop, check all the keys and values and split those values with the delimiter pattern where it appears one or more times.

The regex is

(\*[A-Z0-9]{2})(.*?)(?=(?!\1)\*[A-Z0-9]{2}|$)

See the regex demo.

Details

  • (\*[A-Z0-9]{2}) - Delimiter, Group 1: a * and two uppercase letters or digits
  • (.*?) - Value, Group 2: any 0+ chars other than line break chars, as few as possible
  • (?=(?!\1)\*[A-Z0-9]{2}|$) - up to the delimiter pattern (\*[A-Z0-9]{2}) that is not equal to the text captured in Group 1 ((?!\1)) or end of string ($).

See the PHP demo:

$re = '/(\*[A-Z0-9]{2})(.*?)(?=(?!\1)\*[A-Z0-9]{2}|$)/';
$str = '*01the title*35the author*A7other useless infos*AEother useful infos*AEsome delimiters can be there multiple times';
$res = [];
if (preg_match_all($re, $str, $m, PREG_SET_ORDER, 0)) {
    foreach ($m as $kvp) {
        $tmp = preg_split('~\*[A-Z0-9]+~', $kvp[2]);
        if (count($tmp) > 1) {
            $res[$kvp[1]] = $tmp;
        } else {
            $res[$kvp[1]] = $kvp[2];
        }
    }
    print_r($res);
}

Output:

Array
(
    [*01] => the title
    [*35] => the author
    [*A7] => other useless infos
    [*AE] => Array
        (
            [0] => other useful infos
            [1] => some delimiters can be there multiple times
        )

)

Upvotes: 1

bambamboole
bambamboole

Reputation: 587

Ok, I answer my own question on how to handle the multiple same delimiters. Thanks to @markus-ankenbrand for the start:

$attributes = preg_split('/(\*[A-Z0-9]{2})/', $line, -1, PREG_SPLIT_DELIM_CAPTURE);
        $matches = [];
        for ($i = 1; $i < sizeof($attributes) - 1; $i += 2) {
            if (isset($matches[$attributes[$i]]) && is_array($matches[$attributes[$i]])) {
                $matches[$attributes[$i]][] = $attributes[$i + 1];
            } elseif (isset($matches[$attributes[$i]]) && !is_array($matches[$attributes[$i]])) {
                $currentValue = $matches[$attributes[$i]];
                $matches[$attributes[$i]] = [$currentValue];
                $matches[$attributes[$i]][] = $attributes[$i + 1];
            } else {
                $matches[$attributes[$i]] = $attributes[$i + 1];
            }
        }

The fat if/else statement does not look really nice, but it does what it need to do.

Upvotes: 0

Related Questions