Sean
Sean

Reputation: 1108

Regular expression for matching n amount of parenthesis

I need to parse expressions of the following form:

(S (A (B (D xyz)) (C m)))

The amount of ( will always equal the amount of ), but there may be any arbitrary number of opening and closing parenthesis pairs between the (S ). In this case, I would want to extract the (A (B (D xyz)) (C m)). There may be any number of (S ) clauses in a file, so I can't simply do a ^(S .* )$ kind of pattern matching.

If I knew the amount of potential opening and closing parenthesis pairs between the (S ) this wouldn't be as difficult, but I'm not sure how to write a regular expression that will know to match an arbitrary amount of ().

Any help in obtaining the regexp pattern would be greatly appreciated. Thanks in advance.

Upvotes: 3

Views: 1005

Answers (3)

user557597
user557597

Reputation:

Probably a record descent parse would be the best bet. But if you just want to find (S) balanced, it could be done with a regex that does recursion in the engine.

It will find the outter most balance. If you were looking to nesting like (S(S)) that might involve recursively calling a function that implements the regex, passing the 'core' of a sucessfull match. And possibly creating a parent-child structure in the process. But if its this involved, a real parser might be the solution.

How it can be solved with a Perl regex -

$str = '(some (stuff  (S (A (B (D xyz)) (C m))) the end ) (S extra))';

$regex = qr~
[(]
\s* S \s*
(                   # 1
    (                       # 2
      [(]
      (?:  (?> [^()]+ ) 
        |  (?2)                                         
      )*                                               
      [)]
    )
 |
    [^)]*
)
[)]
~x;

while ($str =~ /$regex/g)
{
    print "found '$1'\n";
}

prints

found '(A (B (D xyz)) (C m))'
found 'extra'

Upvotes: 0

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726609

This cannot be done in theory, and can be done in practice only when maximal number of nested parentheses is known upfront. That solution requires a rather non-pleasant expression, and is usually attempted as a curious homework exercise. Here is a link with better explanation of why regexp language is not powerful enough to solve the matching parentheses problem.

You need a parser to solve this problem; a simple recursive descent one will do the trick. Wikipedia article at the link above has a sample implementation in C, which you should be able to translate to other languages with relative ease.

Upvotes: 1

Matthew Flaschen
Matthew Flaschen

Reputation: 284836

Matching an arbitrary number is impossible with pure regular expressions. In other words, you can not match a count that it is unknown when you're generating/writing the regular expression is impossible. Matching n pairs (however high n is) is possible as long as you know n when generating the regex.

Upvotes: 1

Related Questions