turnt
turnt

Reputation: 3255

Regex for parenthesis (JavaScript)

This is the regexp I created so far: \((.+?)\)

This is my test string: (2+2) + (2+3*(2+3))

The matches I get are:

(2+2) And (2+3*(2+3)

I want my matches to be:

(2+2) And (2+3*(2+3))

How should I modify my regular expression?

Upvotes: 0

Views: 682

Answers (2)

Giulio Franco
Giulio Franco

Reputation: 3230

You cannot parse parentesized expressions with regular expression. There is a mathematical proof that regular expressions can't do this. Parenthesized expressions are a context-free grammar, and can thus be recognized by pushdown automata (stack-machines).

You can, anyway, define a regular expression that will work on any expression with less than N parentheses, with an arbitrary finite N (even though the expression will get complex). You just need to acknowledge that your parentheses might contain another arbitrary number of parenteses.

\(([^()]+(\([^)]+\)[^)]*)*)\)

It works like this:

  1. \(([^()]+ matches an open parenthesis, follwed by whatever is not a parenthesis;
  2. (\([^)]+\)[^)]*)* optionally, there may be another group, formed by an open parenthesis, with something inside it, followed by a matching closing parenthesis. Some other non-parenthesis character may follow. This can be repeated an arbitrary amount of times. Anyway, at last, there must be
  3. )\) another closed parenthesis, which matches with the first one.

This should work for nesting depth 2. If you want nesting depth 3, you have to further recurse, allowing each of the groups I described at point (2) to have a nested parenthesized group.

Things will get much easier if you use a stack. Such as:

foundMatches = [];
mStack = [];
start = RegExp("\\(");
mid = RegExp("[^()]*[()]?");
idx = 0;
while ((idx = input.search(start.substr(idx))) != -1) {
    mStack.push(idx);
    //Start a search
    nidx = input.substr(idx + 1).search(mid);
    while (nidx != -1 && idx + nidx < input.length) {
        idx += nidx;
        match = input.substr(idx).match(mid);
        match = match[0].substr(-1);
        if (match == "(") {
            mStack.push(idx);
        } else if (mStack.length == 1) {
            break;
        }
        nidx = input.substr(idx + 1).search(mid);
    }
    //Check the result
    if (nidx != -1 && idx + nidx < input.length) {
        //idx+nidx is the index of the last ")"
        idx += nidx;
        //The stack contains the index of the first "("
        startIdx = mStack.pop();
        foundMatches.push(input.substr(startIdx, idx + 1 - startIdx));
    }
    idx += 1;
}

Upvotes: 4

Joohwan
Joohwan

Reputation: 2512

How about you parse it yourself using a loop without the help of regex? Here is one simple way:

  • You would have to have a variable, say "level", which keeps track of how many open parentheses you have come across so far (initialize it with a 0).
  • You would also need a string buffer to contain each of your matches ( e.g. (2+2) or (2+3 * (2+3)) ) .
  • Finally, you would need somewhere you can dump the contents of your buffer into whenever you finish reading a match.

  • As you read the string character by character, you would increment level by 1 when you come across "(", and decrement by 1 when you come across ")". You would then put the character into the buffer.

  • When you come across ")" AND the level happens to hit 0, that is when you know you have a match. This is when you would dump the contents of the buffer and continue.

This method assumes that whenever you have a "(" there will always be a corresponding ")" in the input string. This method will handle arbitrary number of parentheses.

Upvotes: 1

Related Questions