Reputation: 23
I'm using /{(\w+)\s+((\w+="\w+")\s*)+/
pattern to capture all attributes.
The problem is that it matches the input but can't group attribute one by one and just groups the last attribute.
[person name="Jackson" family="Smith"]
or
[car brand="Benz" type="SUV"]
Upvotes: 1
Views: 103
Reputation: 47894
The \G
(continue) metacharacter is the hero to call upon here.
Code: (PHP Demo) (Regex101 Demo)
$tag = '[person name="Jackson" family="Smith"]';
var_export(preg_match_all('~(?:\G|\[\w+) (\w+)="(\w+)"~', $tag, $out) ? array_combine($out[1], $out[2]) : []);
Output:
array (
'name' => 'Jackson',
'family' => 'Smith',
)
If you need to pool the attributes&values with the tag name, only one loop is necessary for this too.
Code: (Demo)
$text = 'some text [person name="Jackson" family="Smith"] text [vehicle brand="Benz" type="SUV" doors="4" seats="7"]';
foreach (preg_match_all('~(?:\G(?!^)|\[(\w+)) (\w+)="(\w+)"~', $text, $out, PREG_SET_ORDER) ? $out : [] as $matches) {
if ($matches[1]) {
$tag = $matches[1]; // cache the tag name for reuse with subsequent attr/val pairs
}
$result[$tag][$matches[2]] = $matches[3];
}
var_export($result);
Output:
array (
'person' =>
array (
'name' => 'Jackson',
'family' => 'Smith',
),
'vehicle' =>
array (
'brand' => 'Benz',
'type' => 'SUV',
'doors' => '4',
'seats' => '7',
),
)
Due to the concerns of @Thefourthbird and @Jan, I have included a lookahead to match the closing square brace. I have also built in accommodation for the possibility of zero attributes in the tag. If given more time (sorry, don't have more), I could probably refine the following snippet to be slightly cleaner, but I believe I am accurately validating and extracting.
Code: (Demo)
$text = 'some text [person name="Jackson" family="Smith"] text [vehicle brand="Benz" type="SUV" doors="4" seats="7"] and [invalid closed="false" monkeywrench [lonetag] text [single gender="female"]';
foreach (preg_match_all('~\[(\w+)(?=(?: \w+="\w+")*])(]?)|(?:\G(?!^) (\w+)="(\w+)")~', $text, $out, PREG_SET_ORDER) ? $out : [] as $matches) {
if ($matches[2]) {
$result[$matches[1]] = [];
} elseif (!isset($matches[3])) {
$tag = $matches[1];
} else {
$result[$tag][$matches[3]] = $matches[4];
}
}
var_export($result);
Output:
array (
'person' =>
array (
'name' => 'Jackson',
'family' => 'Smith',
),
'vehicle' =>
array (
'brand' => 'Benz',
'type' => 'SUV',
'doors' => '4',
'seats' => '7',
),
'lonetag' =>
array (
),
'single' =>
array (
'gender' => 'female',
),
)
Upvotes: 2
Reputation: 43169
Better use two expressions (or a parser altogether) instead. Consider the following:
<?php
$junk = <<<END
lorem ipsum lorem ipsum
[person name="Jackson" family="Smith"]
lorem ipsum
[car brand="Benz" type="SUV"]
lorem ipsum lorem ipsum
END;
$tag = "~\[(?P<tag>\w+)[^][]*\]~";
$key_values = '~(?P<key>\w+)="(?P<value>[^"]*)"~';
preg_match_all($tag, $junk, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
echo "Name: {$match["tag"]}\n";
preg_match_all($key_values, $match[0], $attributes, PREG_SET_ORDER);
print_r($attributes);
}
?>
Here we have
\[(?P<tag>\w+)[^][]*\]
for likely tags and
(?P<key>\w+)="(?P<value>[^"]*)"
for key/value pairs. The rest is a foreach loop.
Upvotes: 0
Reputation: 37367
You can try \[\S+ ((?:[^"]+"){2}) ((?:[^"]+"){2})\]
Explanation:
\[
- match [
literallly
\S+
- mach one or more of non-whitespace characters
(?...)
- non-capturing group
[^"]+"
- match one or more characters other from "
and repeat pattern two times due to {2}
\]
- match ]
literally
In first capturing group will be your first attribute, in second there will be the second attribute.
Upvotes: 1