How to Nested regular expression in PHP?

I am trying to work with regular expressions using php to achieve the following:

I am interested in obtaining an array with the citrus The rules are: * The citrus list is in parentheses * Citrus fruits are separated by ",", but could also be separated by " " only. I have tried:

<? php
    $ string = "Citrus fruits (oranges, mandarins lemons) have many nutrients";
    $ regex = "/\([,\s]*?(\w+)[,\s]*?\)/";
     preg_match_all ($ regex, $ string, $ matches);
    print_r ($ matches);
?>

But I can't get it, I've tried several expressions with no results

Some help?

Upvotes: 0

Views: 80

Answers (3)

The fourth bird
The fourth bird

Reputation: 163362

The expression that you tried \([,\s]*?(\w+)[,\s]*?\) will only match a single fruit optionally between spaces or comma's because there is no repetition.

To get the repeated separate matches you could use the \G anchor

You don't have to make the character class non greedy [,\s]*? as the comma or the whitespace char can not overlap with matching the word characters \w+

If you want to make sure that there is a closing parenthesis, you could use a positive lookahead

(?:\((?=[^()]*\))|\G(?!\A)[,\h]+)\K\w+
  • (?: Non capture group
    • \( Match (
    • (?=[^()]*\)) Assert what is on the right is a closing parenthesis
    • | Or
    • \G(?!\A)[,\h]+ Assert the position at the end of the previous match, not at the start
  • ) Close non capture group
  • \K Forget what is matched
  • \w+ Match 1+ word chars

Regex demo

Note that using [,\h]+ could also match consecutive comma's or horizontal whitespace chars oranges,,,

If the comma is optional and the space is always there, you could also use ,?\h+ to prevent that


Another option is to get the match first and then split on [,\h+] either a comma or 1+ horizontal whitspace chars

(?<=\()\w+(?:[,\h]+\w+)*(?=\))

Regex demo | Php demo

For example

$re = '`(?<=\()\w+(?:[,\h]+\w+)*(?=\))`';
$str = 'Citrus fruits (oranges, mandarins lemons) have many nutrients';

preg_match($re, $str, $matches);
print_r(preg_split("~[,\h+]~", $matches[0], -1, PREG_SPLIT_NO_EMPTY));

Output

Array
(
    [0] => oranges
    [1] => mandarins
    [2] => lemons
)

Upvotes: 1

user13469682
user13469682

Reputation:

Try (?:(?!^)\G[ ,]+|\([ ]*)\K\w+

demo

Work by finding paren ( then match word, continue \G match next word.
Repeat until no more word

extend demo2

Upvotes: 0

mchljams
mchljams

Reputation: 441

Here is a suggestion. First, parse the string between the parenthesis to get your list. Then, you could try and use str_replace(), to change the spaces to commas. Then Finally, use the explode() to convert the now comma separated string to an array. The only problem would be if one of the items has a space in it's name.

Upvotes: 0

Related Questions