Shawn G.
Shawn G.

Reputation: 622

JavaScript RegExp

I'm trying to get

Match 1: test(testing() tester())

Match 2: theTest()

From

test(testing() tester()) theTest()

And I am using this RegExp

/([a-z]+)\((.*)\)/ig

But is it matching the whole string instead

I figure the problem lies in the .* but I cannot figure out what to do

How do I get the RegExp to match the braces without conflicting with inside braces

Here is an Example

EDIT: Since I have found that this is not entirely possible for what am looking for, is there a Function or Methods that could accomplish what I am looking for?

Upvotes: 1

Views: 263

Answers (4)

ridgerunner
ridgerunner

Reputation: 34395

Interesting problem. Yes, it is true that the JavaScript regex engine cannot match the outermost balanced pair of matching parentheses, but it can easily match innermost balanced pairs using the following simple regex pattern:

reInnerParens

/\([^()]*\)/

This regex can be effectively employed in an iterative manner to match nested balanced parentheses from the inside out. The following useful tested function uses this method to determine if a string has balanced, possibly nested to any depth, matching parentheses:

function isBalancedParens(text)

function isBalancedParens(text) {
    var reInnerParens = /\([^()]*\)/g;
    // Iteratively remove balanced pairs from inside out.
    while (text.search(reInnerParens) !== -1) {
        text = text.replace(reInnerParens, '');
    }
    // Any remaining parens indicate unbalanced pairs.
    if (/[()]/.test(text)) return false;
    return true;
}

The above function works by iteratively removing innermost balanced parentheses from the inside out until there are no more matches. If there are any remaining parentheses, then the string contains un-matched parentheses and is not balanced.

A similar iterative technique can be used to solve the problem at hand. First, a regex is needed that matches a balanced pair of parentheses containing at least one inner pair of parentheses, but nested only one level deep. Here it is in free-spacing mode format:

reOuterParens

/* reOuterParens
    # Match outer parens having inner parens one level deep.
    \(          # Outer open paren.
    (           # $1: Contents of outer parens .
      (?:       # One or more nested parens (1 deep).
        [^()]*  # Zero or more non-parens.
        \(      # Inner open paren.
        [^()]*  # Zero or more non-parens.
        \)      # Inner close paren.
      )+        # One or more nested parens (1 deep).
      [^()]*    # Zero or more non-parens.
    )           # End $1: Contents of outer parens .
    \)          # Outer close paren.
*/
var reOuterParens = /\(((?:[^()]*\([^()]*\))+[^()]*)\)/g;

The following tested JavaScript function iteratively applies this regex to "hide" all inner parentheses as HTML entities. Once this is completed, then only the desired outermost parentheses remain.

function getOutermostParens(text)

// Match and return all outermost "word(..(..))" patterns from string.
function getOutermostParens(text) {
    var reOuterParens = /\(((?:[^()]*\([^()]*\))+[^()]*)\)/g;
    var results = [];
    // Ensure all (possibly nested) matching parentheses are properly balanced.
    if (!isBalancedParens(text)) return null;
    text = text.replace(/&/g, '&') // Temporarily hide html entities.
    // Iteratively hide all parens nested one level deep.
    while (text.search(reOuterParens) !== -1) {
        // Hide nested parens by converting to html entities.
        text = text.replace(reOuterParens,
            function(m0, m1){
                m1 = m1.replace(/[()]/g,
                    function(n0){
                        return {'(':'(', ')': ')'}[n0];
                    });
                return '('+ m1 +')';
            });
    }
    // Match all outermost "word(...)" and load into results array.
    text.replace(/\w+\([^()]*\)/g,
        function(m0){
            m0 = m0.replace(/&#4[01];/g, // Restore hidden parens.
                function(n0){
                    return {'(': '(', ')': ')'}[n0];
                });
            // Restore temporarily hidden html entities.
            m0 = m0.replace(/&/g, '&');
            results.push(m0);
            return ''; // Not used.
        });
    return results;
}

Note that inner, nested () parentheses characters are hidden by replacing them with their HTML entity equivalents (i.e. ( and )), but to do this safely, all HTML entities that may exist in the original string must first be protected. This is done by replacing all & with & at the beginning of the routine and these are all then restored at the end of the routine.

Upvotes: 2

Rajesh Paul
Rajesh Paul

Reputation: 7009

Use the following regexp:

/[a-z]+\(([a-z]+\(\) [a-z]+\(\))*\)/gi

Full code:

str.match(/[a-z]+\(([a-z]+\(\) [a-z]+\(\))*\)/gi);

O/P:

["test(testing() tester())", "theTest()"]

Upvotes: -1

prashantkumar1190
prashantkumar1190

Reputation: 9

    String i = "test(testing() tester()) theTest()";

    String regex = "\\w+\\(\\w+\\(\\)\\s\\w+\\(\\)\\)|\\w+\\(\\)";
    p = Pattern.compile(regex);
    m = p.matcher(i);
    if (m.find()) {
        System.out.println(m.group());
    }

try using this regex if your text is this much only.

Upvotes: -1

adeneo
adeneo

Reputation: 318182

Why not just split the string on last space ?

str.split(/ (?=[^ ]*$)/);

FIDDLE

Upvotes: 1

Related Questions