Reputation: 35
Let's say I have this string:
1 + 2 * (3 + (23 + 53 - (132 / 5) + 5) - 1) + 2 / 'test + string' - 52
I want to split it into an array of operators and non-operators, but anything between the ()
and '
must not be split.
I want the output to be:
[1, "+", 2, "*", "(3 + (23 + 53 - (132 / 5) + 5) - 1)", "+", 2, "/", "'test + string'", "-", 52]
I'm using this code:
preg_split("~['\(][^'()]*['\)](*SKIP)(*F)|([+\-*/^])+~", $str, -1, PREG_SPLIT_DELIM_CAPTURE);
The technique does what I want with the operators and the '
, but not for ()
. However it only keeps (132 / 5)
(the deepest nested parenthetical expression) and splits all the other ones, giving me this output:
[1, "+", 2, "*", "(3", "+", "(23", "+", 53, "-", "(132 / 5)", "+", "5)", "-", "1)", "+", 2, "/", "'test + string'", "-", 52]
How can I ensure that the outermost parenthetical expression and all of its contents remain together?
Upvotes: 2
Views: 95
Reputation: 47991
I do like @thefourthbird's recursive subpattern, but I would prefer to standardize the output elements so that all whitespace is removed.
I won't use delimiter capturing or skip-fail, but fullstring restarts (\K
) to omit the spaces.
Code: (Demo)
preg_split(
"~(?:(\((?:[^()]+|(?1))*\))|'[^']*'|[\d.]+|[*/^+-])\K ?~",
$str,
-1,
PREG_SPLIT_NO_EMPTY
)
I have done similar techniques on SO like this one. Another consideration is: how do you want to handle signed numbers? Should the numberic entity retain the sign symbol or should it be separated as if it were an operator?
Upvotes: 2
Reputation: 163477
You might use a pattern to recurse the first sub pattern matching balanced parenthesis and then use the SKIP FAIL. After the alternation you can still use the capture group, which will be group 2 and the values will be kept due to the PREG_SPLIT_DELIM_CAPTURE
flag.
To remove the empty entries, you can add the PREG_SPLIT_NO_EMPTY
flag.
(?:(\((?:[^()]++|(?1))*\))|'[^']*')(*SKIP)(*F)|([+\-*/^])
$str = "1 + 2 * (3 + (23 + 53 - (132 / 5) + 5) - 1) + 2 / 'test + string' - 52";
$result = preg_split("~(?:(\((?:[^()]++|(?1))*\))|'[^']*')(*SKIP)(*F)|([+\-*/^])~", $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($result);
Output
Array
(
[0] => 1
[1] => +
[2] => 2
[3] => *
[4] => (3 + (23 + 53 - (132 / 5) + 5) - 1)
[5] => +
[6] => 2
[7] => /
[8] => 'test + string'
[9] => -
[10] => 52
)
Upvotes: 3