Mehdi Khademloo
Mehdi Khademloo

Reputation: 2812

Can Regex handle nested operation?

I have some string like:

"1+""2*3"*4"+"5*6""

and with some regexes, the answer should be:

(1+((2*3)*4)+(5*6))

can regex do this?

Upvotes: 4

Views: 88

Answers (1)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 477676

If the input is valid, we can exploit some redundant constraints of such language to add the appropriate brackets. But this is actually more using "tricks". I really advice to use a more sophisticated tool, like a push down automaton (so a parser).

Well there is something here that we can spot: that is that if the next character is a number, or a sequence of double quote followed by a number, all other double quotes.

So we basically can do this with two regexes:

  • the first one replaces all double quotes that are followed by zero or more double quotes followed by a number; and
  • then replace all remaining double quotes (this is actually simply character replacement, so no regex necessary).

But this trick only works if the original input is valid. If it contains double quotes around operators, like "+", then it can produce totally different results.

In Python, we can for example use:

from re import sub

def add_brackets(text):
    return sub(r'["](?=\"*[(\d])', '(', text).replace('"', ')')

This then gives us:

>>> add_brackets('"1+""2*3"*4"+"5*6""')
'(1+((2*3)*4)+(5*6))'

This here thus works because we only consider numbers and operators. If we would add variables, it would still work, but if we add more sophisticated elements like functions, then the problem becomes harder.

"Recursive languages" (well languages where certain element(s) can be defined in terms of itself) however are better parsed by tooling that is constructed for that, a push down automaton.

The pumping lemma for regular languages shows that languages like (n)n (a language that contains strings with a number of opening brackets, followed by the same number of closing brackets), can not be validated with regular expressions. The language you describe here is an example of that. So this regex is not capable of validating that. Some programming languages like Perl have extended regular expressions, such that these can validate balancing brackets. These are not regular expressions, at least not as defined by Stephen Kleene.

Upvotes: 6

Related Questions