user3838356
user3838356

Reputation:

Find string in possibly multiple parentheses?

I am looking for a regular expression that discriminates between a string that contains a numerical value enclosed between parentheses, and a string that contains outside of them. The problem is, parentheses may be embedded into each other:

So, for example the expression should match the following strings:

But it should not match any of the following:

So far I've tried

\d[A-Za-z] \)

and easy things like this one. The problem with this one is it does not match the example 2, because it has a ( string after it.

How could I solve this one?

Upvotes: 1

Views: 145

Answers (2)

bignose
bignose

Reputation: 32309

The problem is not one of pattern matching. That means regular expressions are not the right tool for this.

Instead, you need lexical analysis and parsing. There are many libraries available for that job.

You might try the parsing or pyparsing libraries.

Upvotes: 1

l'L'l
l'L'l

Reputation: 47169

These type of regexes are not always easy, but sometimes it's possible to come up with a way provided the input remains somewhat consistent. A pattern generally like this should work:

(.*(\([\d]+[^(].*\)|\(.*[^)][\d]+.*\)).*)

Code:

import re

p = re.compile(ur'(.*(\([\d]+[^(].*\)|\(.*[^)][\d]+.*\)).*)', re.MULTILINE)

result = re.findall(p, searchtext)
print(result)

Result:

https://regex101.com/r/aL8bB8/1

Upvotes: 0

Related Questions