Jackson Smith
Jackson Smith

Reputation: 23

Capture multiple repetitive group in regex

I'm using /{(\w+)\s+((\w+="\w+")\s*)+/ pattern to capture all attributes. The problem is that it matches the input but can't group attribute one by one and just groups the last attribute.

[person name="Jackson" family="Smith"]

or

[car brand="Benz" type="SUV"]

Upvotes: 1

Views: 103

Answers (3)

mickmackusa
mickmackusa

Reputation: 47894

The \G (continue) metacharacter is the hero to call upon here.

Code: (PHP Demo) (Regex101 Demo)

$tag = '[person name="Jackson" family="Smith"]';

var_export(preg_match_all('~(?:\G|\[\w+) (\w+)="(\w+)"~', $tag, $out) ? array_combine($out[1], $out[2]) : []);

Output:

array (
  'name' => 'Jackson',
  'family' => 'Smith',
)

If you need to pool the attributes&values with the tag name, only one loop is necessary for this too.

Code: (Demo)

$text = 'some text [person name="Jackson" family="Smith"] text [vehicle brand="Benz" type="SUV" doors="4" seats="7"]';

foreach (preg_match_all('~(?:\G(?!^)|\[(\w+)) (\w+)="(\w+)"~', $text, $out, PREG_SET_ORDER) ? $out : [] as $matches) {
    if ($matches[1]) {
        $tag = $matches[1];  // cache the tag name for reuse with subsequent attr/val pairs
    }
    $result[$tag][$matches[2]] = $matches[3];
}

var_export($result);

Output:

array (
  'person' => 
  array (
    'name' => 'Jackson',
    'family' => 'Smith',
  ),
  'vehicle' => 
  array (
    'brand' => 'Benz',
    'type' => 'SUV',
    'doors' => '4',
    'seats' => '7',
  ),
)

Due to the concerns of @Thefourthbird and @Jan, I have included a lookahead to match the closing square brace. I have also built in accommodation for the possibility of zero attributes in the tag. If given more time (sorry, don't have more), I could probably refine the following snippet to be slightly cleaner, but I believe I am accurately validating and extracting.

Code: (Demo)

$text = 'some text [person name="Jackson" family="Smith"] text [vehicle brand="Benz" type="SUV" doors="4" seats="7"] and [invalid closed="false" monkeywrench [lonetag] text [single gender="female"]';

foreach (preg_match_all('~\[(\w+)(?=(?: \w+="\w+")*])(]?)|(?:\G(?!^) (\w+)="(\w+)")~', $text, $out, PREG_SET_ORDER) ? $out : [] as $matches) {
    if ($matches[2]) {
        $result[$matches[1]] = [];
    } elseif (!isset($matches[3])) {
        $tag = $matches[1];
    } else {
        $result[$tag][$matches[3]] = $matches[4];
    }
}

var_export($result);

Output:

array (
  'person' => 
  array (
    'name' => 'Jackson',
    'family' => 'Smith',
  ),
  'vehicle' => 
  array (
    'brand' => 'Benz',
    'type' => 'SUV',
    'doors' => '4',
    'seats' => '7',
  ),
  'lonetag' => 
  array (
  ),
  'single' => 
  array (
    'gender' => 'female',
  ),
)

Upvotes: 2

Jan
Jan

Reputation: 43169

Better use two expressions (or a parser altogether) instead. Consider the following:

<?php

$junk = <<<END
lorem ipsum lorem ipsum
[person name="Jackson" family="Smith"]
lorem ipsum
[car brand="Benz" type="SUV"]

lorem ipsum lorem ipsum
END;

$tag = "~\[(?P<tag>\w+)[^][]*\]~";
$key_values = '~(?P<key>\w+)="(?P<value>[^"]*)"~';

preg_match_all($tag, $junk, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
    echo "Name: {$match["tag"]}\n";

    preg_match_all($key_values, $match[0], $attributes, PREG_SET_ORDER);
    print_r($attributes);
}
?>

Here we have

\[(?P<tag>\w+)[^][]*\]

for likely tags and

(?P<key>\w+)="(?P<value>[^"]*)"

for key/value pairs. The rest is a foreach loop.

Upvotes: 0

Michał Turczyn
Michał Turczyn

Reputation: 37367

You can try \[\S+ ((?:[^"]+"){2}) ((?:[^"]+"){2})\]

Explanation:

\[ - match [ literallly

\S+ - mach one or more of non-whitespace characters

(?...) - non-capturing group

[^"]+" - match one or more characters other from " and repeat pattern two times due to {2}

\] - match ] literally

In first capturing group will be your first attribute, in second there will be the second attribute.

Demo

Upvotes: 1

Related Questions