Reputation: 2491
Below is a string I've tried to explode only on comma's outside of the first set of brackets.
Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour
preg_split("/[\[\]|()]+/", "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour", -1, PREG_SPLIT_NO_EMPTY);
Which returns:
[0] => Wheat Flour
[1] => 2%
[2] => Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin
[3] => B3
[4] => , Thiamin
[5] => B1
[6] => , Ascorbic Acid
[7] => , Water, Yeast, Salt, Vegetable Oils
[8] => Palm, Rapeseed
[9] => , Soya Flour
preg_split('/\|(?![^(]*\))/', "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour");
Returns:
[0] => Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed), Soya Flour
The first attempt is the closest I've been able to get to the below output I'm trying to get.
[0] => "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid]"
[1] => "Water"
[2] => "Yeast"
[3] => "Salt"
[4] => "Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed))"
[5] => "Soya Flour"
Upvotes: 1
Views: 123
Reputation: 626845
You can use
$text = "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour";
if (preg_match_all('~[^][(),\s][^][(),]*(?:\s*(?:(\[(?:[^][]++|(?1))*])|(\((?:[^()]++|(?2))*\))))*~', $text, $matches)) {
print_r($matches[0]);
}
See the regex demo and the PHP demo.
Details:
[^][(),\s]
- a char other than square and round brackets, a comma and whitespace[^][(),]*
- zero or more chars other than square and round brackets and a comma(?:
- a non-capturing group:
\s*
- zero or more whitespaces(?:
- either(\[(?:[^][]++|(?1))*])
- a [...]
substring with nested [...]
|
- or(\((?:[^()]++|(?2))*\))
- a (...)
substring with any nested parentheses inside)*
- an optional sequence, zero or more times.Upvotes: 1
Reputation: 785156
You may use this PCRE regex for splitting:
(?:(\((?:[^()]*|(?-1))*\))|(\[(?:[^][]*|(?-1))*\]))(*SKIP)(*F)|\h*,\h*
Code:
$s = 'Wheat Flour [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour';
$re = '~(?:(\((?:[^()]*|(?-1))*\))|(\[(?:[^][]*|(?-1))*\]))(*SKIP)(*F)|\h*,\h*~';
print_r(preg_split($re, $s));
Output:
Array
(
[0] => Wheat Flour [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid]
[1] => Water
[2] => Yeast
[3] => Salt
[4] => Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed))
[5] => Soya Flour
)
RegEx Explained:
(?:
: Start non-capture group
(\((?:[^()]*|(?-1))*\))
: Recursive pattern to match a possibly nested (...)
substring|
: OR(\[(?:[^][]*|(?-1))*\])
: Recursive pattern to match a possibly nested [...]
substring)
:(*SKIP)(*F)
: Skip and Fail this match i.e. retain this data in split result|
: OR\h*,\h*
: Match a comma surrounded with 0 or more whitespaces on either sideUpvotes: 4