getfugu
getfugu

Reputation: 140

Splitting a string by spaces unless spaces are within curly or square brackets at the shallowest level

I want to separate a string into an array based on spaces, with the caveat that spaces within a pair of curly or square brackets should be ignored.

I was able to find some answers that are close to what I want here and here, but they don't handle brackets nested within other brackets.

How do I split this string:

foo bar["s 1"]{a:{b:["s 2", "s 3"]}, x:" [s 4] "} woo{c:y} [e:{" s [6]"}] [simple square bracket] {simple curly bracket}

Into this array?

["foo", "bar[\"s 1\"]{a:{b:[\"s 2\", \"s 3\"]}, x:\" [s 4] \"}", "woo{c:y}", "[e:{\" s [6]\"}]", "[simple square bracket]", "{simple curly bracket}"]

When using the regex from the first link, I modified the regular expression to work with square and curly brackets, and got the correct output for the simple, un-nested parts of the example, but not for the complex nested area. See here.

The second link's answers relied on JSON formatting with colons, and it doesn't apply because my input will not necessarily be valid JSON and it also doesn't have a similar character pattern to adapt the answer to.

According to a commenter, this may not possible to do with regular expressions. Even if that is the case, any way of splitting the string to achieve the desired result would be considered a correct answer.

Upvotes: 1

Views: 280

Answers (1)

porcus
porcus

Reputation: 833

Regular expressions are great for certain things. But if you wish to support arbitrarily deeply nested expressions, then regular expressions aren't really the right tool for the job.

Instead, consider the following approach which uses a stack to track beginnings and endings of bracketed expressions:

Sample code

function getfugu_split(input) {
  var i = 0, stack = [], parts = [], part = '';
  while(i < input.length) {
    var c = input[i]; i++;  // get character
    if (c == ' ' && stack.length == 0) {
      parts.push(part.replace(/"/g, '\\\"'));  // append part
      part = '';  // reset part accumulator
      continue;
    }
    if (c == '{' || c == '[') stack.push(c);  // begin curly or square brace
    else if (c == '}' && stack[stack.length-1] == '{') stack.pop();  // end curly brace
    else if (c == ']' && stack[stack.length-1] == '[') stack.pop();  // end square brace
    part += c; // append character to current part
  }
  if (part.length > 0) parts.push(part.replace(/"/g, '\\\"'));  // append remaining part
  return parts;
}

Example usage

getfugu_split('foo bar["s 1"]{a:{b:["s 2", "s 3"]}, x:" [s 4] "} woo{c:y} [e:{" s [6]"}] [simple square bracket] {simple curly bracket}')

Output

["foo", "bar[\"s 1\"]{a:{b:[\"s 2\", \"s 3\"]}, x:\" [s 4] \"}", "woo{c:y}", "[e:{\" s [6]\"}]", "[simple square bracket]", "{simple curly bracket}"]

Note that the above code almost certainly won't handle every possible requirement you may have or edge case you're likely to encounter. (e.g. Imbalanced square/curly braces may not be handled the way you'd expect.) But if you understand what it's doing, then you should be able to adapt it to suit your needs. I hope this helps! :)

Upvotes: 1

Related Questions