Bluefire
Bluefire

Reputation: 14109

Regex to parse out values in parentheses

I want to be able to separate a string into values by splitting by spaces, but if something is in parentheses I need it to be in a single value. So for example, (a b c) d e (f g) h should become ['a b c', 'd', 'e', 'f g', 'h']. What's a regex that will do that for me?

Upvotes: 1

Views: 117

Answers (2)

wp78de
wp78de

Reputation: 18950

Right, the standard JavaScript regex engine cannot handle nested patterns. If you use Perl, PHP or .NET you can do it with a pattern like this:

(?(DEFINE)
  (?<open>\()
  (?<close>\))
  (?<val>(?&open)|(\w\s?)+)
  (?<start>(?&open)(?&val)(?&close))
)
(?&start)|(?<=\s)\w

It can be done in JavaScript too using an extended JavaScript regular expressions library like XRegExp. Here is a sample, to give you the idea:

const str1 = '(a b c) d e (f g) h';
var s = XRegExp.matchRecursive(str1, '\\(', '\\)', 'g');
console.log(s);
// -> ['a b c', 'f g']

Upvotes: 0

Ammar Alyousfi
Ammar Alyousfi

Reputation: 4372

As mentioned in the comments, dealing with nesting in regular expressions is impossible, so this is a code that deals with your problem; it uses regular expressions and other techniques:

var str = '(a (b) c) d e (f g) h';
var match;
var myRe = /\([^]+?\)|\S+/g;
var result = [];

while (match = myRe.exec(str)) {
  result.push(match[0]);
}

var tmp = "";
var final = [];
for (var i = 0; i < result.length; i++) {
  var leftP = (result[i].match(/\(/g) || []).length;
  var rightP = (result[i].match(/\)/g) || []).length;
  if (leftP !== rightP) {
    tmp += result[i];
    for (var j = i + 1; j < result.length; j++) {
      tmp += result[j];
      if ((tmp.match(/\(/g) || []).length === (tmp.match(/\)/g) || []).length) {
        final.push(tmp);
        tmp = "";
        i = j + 1;
        break;
      }
    }
  } else {
    final.push(result[i]);
  }
}
for (var i = 0; i < final.length; i++) {
  final[i] = final[i].replace(/\)(\S+)/g, ') $1');
}
for (var i = 0; i < final.length; i++) {
  final[i] = final[i].replace(/^\(([^]+)\)$/, '$1');
}

It might be not optimized but I think it solves your problem.

Upvotes: 2

Related Questions