Reputation: 140
I want to separate a string into an array based on spaces, with the caveat that spaces within a pair of curly or square brackets should be ignored.
I was able to find some answers that are close to what I want here and here, but they don't handle brackets nested within other brackets.
How do I split this string:
foo bar["s 1"]{a:{b:["s 2", "s 3"]}, x:" [s 4] "} woo{c:y} [e:{" s [6]"}] [simple square bracket] {simple curly bracket}
Into this array?
["foo", "bar[\"s 1\"]{a:{b:[\"s 2\", \"s 3\"]}, x:\" [s 4] \"}", "woo{c:y}", "[e:{\" s [6]\"}]", "[simple square bracket]", "{simple curly bracket}"]
When using the regex from the first link, I modified the regular expression to work with square and curly brackets, and got the correct output for the simple, un-nested parts of the example, but not for the complex nested area. See here.
The second link's answers relied on JSON formatting with colons, and it doesn't apply because my input will not necessarily be valid JSON and it also doesn't have a similar character pattern to adapt the answer to.
According to a commenter, this may not possible to do with regular expressions. Even if that is the case, any way of splitting the string to achieve the desired result would be considered a correct answer.
Upvotes: 1
Views: 280
Reputation: 833
Regular expressions are great for certain things. But if you wish to support arbitrarily deeply nested expressions, then regular expressions aren't really the right tool for the job.
Instead, consider the following approach which uses a stack to track beginnings and endings of bracketed expressions:
function getfugu_split(input) {
var i = 0, stack = [], parts = [], part = '';
while(i < input.length) {
var c = input[i]; i++; // get character
if (c == ' ' && stack.length == 0) {
parts.push(part.replace(/"/g, '\\\"')); // append part
part = ''; // reset part accumulator
continue;
}
if (c == '{' || c == '[') stack.push(c); // begin curly or square brace
else if (c == '}' && stack[stack.length-1] == '{') stack.pop(); // end curly brace
else if (c == ']' && stack[stack.length-1] == '[') stack.pop(); // end square brace
part += c; // append character to current part
}
if (part.length > 0) parts.push(part.replace(/"/g, '\\\"')); // append remaining part
return parts;
}
getfugu_split('foo bar["s 1"]{a:{b:["s 2", "s 3"]}, x:" [s 4] "} woo{c:y} [e:{" s [6]"}] [simple square bracket] {simple curly bracket}')
["foo", "bar[\"s 1\"]{a:{b:[\"s 2\", \"s 3\"]}, x:\" [s 4] \"}", "woo{c:y}", "[e:{\" s [6]\"}]", "[simple square bracket]", "{simple curly bracket}"]
Note that the above code almost certainly won't handle every possible requirement you may have or edge case you're likely to encounter. (e.g. Imbalanced square/curly braces may not be handled the way you'd expect.) But if you understand what it's doing, then you should be able to adapt it to suit your needs. I hope this helps! :)
Upvotes: 1