Reputation: 587
I need to parse a string that has no delimiting character to form an associative array.
Here is an example string:
*01the title*35the author*A7other useless infos*AEother useful infos*AEsome delimiters can be there multiple times
Every "key" (which precedes its "value") is comprised of an asterisk (*) followed by two alphanumeric characters.
I use this regex pattern: /\*[A-Z0-9]{2}/
This is my preg_split()
call:
$attributes = preg_split('/\*[A-Z0-9]{2}/', $line);
This works to isolate the "value", but I also need to extract the "key" to form my desired associative array.
What I get looks like this:
$matches = [
0 => 'the title',
1 => 'the author',
2 => 'other useless infos',
3 => 'other useful infos',
4 => 'some delimiters can be there multiple times'
];
My desired output is:
$matches = [
'*01' => 'the title',
'*35' => 'the author',
'*A7' => 'other useless infos',
'*AE' => [
'other useful infos',
'some delimiters can be there multiple times',
],
];
Upvotes: 2
Views: 1123
Reputation: 47894
Here is a functional-style approach that doesn't require duplicate-keyed values to be consecutively written in the input string.
preg_match_all()
to isolate the two components of each subexpression in the input string.array_map()
to replace each row of indexed match values with a single, associative element....
) to unpack the newly modified matches array as indvidual associative arrays and feed that to array_merge_recursive()
. The native behavior of array_merge_recursive()
is to only create subarray structures where necessary.Code: (Demo)
$str = '*01the title*35the author*A7other useless infos*AEother useful infos*AEsome delimiters can be there multiple times';
var_export(
array_merge_recursive(
...array_map(
fn($row) => [$row[1] => $row[2]],
preg_match_all(
'/(\*[A-Z\d]{2})(.+?)(?=$|\*[A-Z\d]{2})/',
$str,
$m,
PREG_SET_ORDER
) ? $m : []
)
)
);
Output:
array (
'*01' => 'the title',
'*35' => 'the author',
'*A7' => 'other useless infos',
'*AE' =>
array (
0 => 'other useful infos',
1 => 'some delimiters can be there multiple times',
),
)
Upvotes: 0
Reputation: 523
Use the PREG_SPLIT_DELIM_CAPTURE
flag of the preg_split
function to also get the captured delimiter (see documentation).
So in your case:
# The -1 is the limit parameter (no limit)
$attributes = preg_split('/(\*[A-Z0-9]{2})/', $line, -1, PREG_SPLIT_DELIM_CAPTURE);
Now you have element 0
of $attributes
as everything before the first delimiter and then alternating the captured delimiter and the next group so you can build your $matches
array like this (assuming that you do not want to keep the first group):
for($i=1; $i<sizeof($attributes)-1; $i+=2){
$matches[$attributes[$i]] = $attributes[$i+1];
}
In order to account for delimiters being present multiple times you can adjust the line inside the for loop to check whether this key already exists and in that case create an array.
Edit: a possibility to create an array if necessary is to use this code:
for($i=1; $i<sizeof($attributes)-1; $i+=2){
$key = $attributes[$i];
if(array_key_exists($key, $matches)){
if(!is_array($matches[$key]){
$matches[$key] = [$matches[$key]];
}
array_push($matches[$key], $attributes[$i+1]);
} else {
$matches[$attributes[$i]] = $attributes[$i+1];
}
}
The downstream code can certainly be simplified, especially if you put all values in (possibly single element) arrays.
Upvotes: 2
Reputation: 626794
You may match and capture the keys into Group 1 and all the text before the next delimiter into Group 2 where the delimiter is not the same as the first one captured. Then, in a loop, check all the keys and values and split those values with the delimiter pattern where it appears one or more times.
The regex is
(\*[A-Z0-9]{2})(.*?)(?=(?!\1)\*[A-Z0-9]{2}|$)
See the regex demo.
Details
(\*[A-Z0-9]{2})
- Delimiter, Group 1: a *
and two uppercase letters or digits(.*?)
- Value, Group 2: any 0+ chars other than line break chars, as few as possible(?=(?!\1)\*[A-Z0-9]{2}|$)
- up to the delimiter pattern (\*[A-Z0-9]{2}
) that is not equal to the text captured in Group 1 ((?!\1)
) or end of string ($
).See the PHP demo:
$re = '/(\*[A-Z0-9]{2})(.*?)(?=(?!\1)\*[A-Z0-9]{2}|$)/';
$str = '*01the title*35the author*A7other useless infos*AEother useful infos*AEsome delimiters can be there multiple times';
$res = [];
if (preg_match_all($re, $str, $m, PREG_SET_ORDER, 0)) {
foreach ($m as $kvp) {
$tmp = preg_split('~\*[A-Z0-9]+~', $kvp[2]);
if (count($tmp) > 1) {
$res[$kvp[1]] = $tmp;
} else {
$res[$kvp[1]] = $kvp[2];
}
}
print_r($res);
}
Output:
Array
(
[*01] => the title
[*35] => the author
[*A7] => other useless infos
[*AE] => Array
(
[0] => other useful infos
[1] => some delimiters can be there multiple times
)
)
Upvotes: 1
Reputation: 587
Ok, I answer my own question on how to handle the multiple same delimiters. Thanks to @markus-ankenbrand for the start:
$attributes = preg_split('/(\*[A-Z0-9]{2})/', $line, -1, PREG_SPLIT_DELIM_CAPTURE);
$matches = [];
for ($i = 1; $i < sizeof($attributes) - 1; $i += 2) {
if (isset($matches[$attributes[$i]]) && is_array($matches[$attributes[$i]])) {
$matches[$attributes[$i]][] = $attributes[$i + 1];
} elseif (isset($matches[$attributes[$i]]) && !is_array($matches[$attributes[$i]])) {
$currentValue = $matches[$attributes[$i]];
$matches[$attributes[$i]] = [$currentValue];
$matches[$attributes[$i]][] = $attributes[$i + 1];
} else {
$matches[$attributes[$i]] = $attributes[$i + 1];
}
}
The fat if/else statement does not look really nice, but it does what it need to do.
Upvotes: 0