llanato
llanato

Reputation: 2491

PHP: Explode comma outside of brackets

Below is a string I've tried to explode only on comma's outside of the first set of brackets.

Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour

1st Attempt

preg_split("/[\[\]|()]+/", "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour", -1, PREG_SPLIT_NO_EMPTY);

Which returns:

[0] => Wheat Flour 
[1] => 2%
[2] => Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin 
[3] => B3
[4] => , Thiamin 
[5] => B1
[6] => , Ascorbic Acid
[7] => , Water, Yeast, Salt, Vegetable Oils 
[8] => Palm, Rapeseed
[9] => , Soya Flour

2nd Attempt

preg_split('/\|(?![^(]*\))/', "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour");

Returns:

[0] => Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed), Soya Flour

The first attempt is the closest I've been able to get to the below output I'm trying to get.

[0] => "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid]"
[1] => "Water"
[2] => "Yeast"
[3] => "Salt"
[4] => "Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed))"
[5] => "Soya Flour"

Upvotes: 1

Views: 123

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

You can use

$text = "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour"; 
if (preg_match_all('~[^][(),\s][^][(),]*(?:\s*(?:(\[(?:[^][]++|(?1))*])|(\((?:[^()]++|(?2))*\))))*~', $text, $matches)) {
    print_r($matches[0]); 
}

See the regex demo and the PHP demo.

Details:

  • [^][(),\s] - a char other than square and round brackets, a comma and whitespace
  • [^][(),]* - zero or more chars other than square and round brackets and a comma
  • (?: - a non-capturing group:
    • \s* - zero or more whitespaces
    • (?: - either
    • (\[(?:[^][]++|(?1))*]) - a [...] substring with nested [...]
    • | - or
    • (\((?:[^()]++|(?2))*\)) - a (...) substring with any nested parentheses inside
  • )* - an optional sequence, zero or more times.

Upvotes: 1

anubhava
anubhava

Reputation: 785156

You may use this PCRE regex for splitting:

(?:(\((?:[^()]*|(?-1))*\))|(\[(?:[^][]*|(?-1))*\]))(*SKIP)(*F)|\h*,\h*

RegEx Demo

Code:

$s = 'Wheat Flour [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour';
$re = '~(?:(\((?:[^()]*|(?-1))*\))|(\[(?:[^][]*|(?-1))*\]))(*SKIP)(*F)|\h*,\h*~';

print_r(preg_split($re, $s));

Output:

Array
(
    [0] => Wheat Flour [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid]
    [1] => Water
    [2] => Yeast
    [3] => Salt
    [4] => Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed))
    [5] => Soya Flour
)

RegEx Explained:

  • (?:: Start non-capture group
    • (\((?:[^()]*|(?-1))*\)): Recursive pattern to match a possibly nested (...) substring
    • |: OR
    • (\[(?:[^][]*|(?-1))*\]): Recursive pattern to match a possibly nested [...] substring
  • ):
  • (*SKIP)(*F): Skip and Fail this match i.e. retain this data in split result
  • |: OR
  • \h*,\h*: Match a comma surrounded with 0 or more whitespaces on either side

Upvotes: 4

Related Questions