Reputation: 41
I'm trying to build a regex that validates a math equation. The equation itself is very simple, I'm looking to make an English readable equation, that I will later return as true or false. An example would be like so.
((1 and 2) or 3)
In this example, I will swap out any numbers with either true, or false. I will also replace, "and" with "&&" and "or" with "||" in order to run the equation using PHP. The response to this will ultimately be either true, or false.
An example final equation would look something like this:
((true && true) || true)
Here are some more examples of what should be considered valid.
(1 or 2 or 3)
((1 and 2 and 3) or (4 and 5))
So, my question comes in two parts.
Upvotes: 4
Views: 1485
Reputation:
This can be done in php (uses PCRE engine).
Below is just an example.
You could comment out the errors check, then insert boundary constructs
around the regex to make it definitively pass/fail.
The biggest problem is not the recursion, but defining the content boundary
conditions. I've pretty much boiled it down for you. These checks have to
be maintained any how you do it, state, stacks ..., its all the same.
( This regex was constructed and tested using RegexFormat 6 )
Sample input:
((( (1 and 2 and 3) or (9) or ( ( 4 and 5)) and 5 ) and 7) )
Tested output:
** Grp 0 - ( pos 0 , len 64 )
((( (1 and 2 and 3) or (9) or ( ( 4 and 5)) and 5 ) and 7) )
** Grp 1 - ( pos 1 , len 62 )
(( (1 and 2 and 3) or (9) or ( ( 4 and 5)) and 5 ) and 7)
** Grp 2 - NULL
** Grp 3 - NULL
** Grp 4 - NULL
Regex:
5/29 All Forms:
Empty form ( )
not allowed
Empty form ) (
not allowed
Form ) and (
ok
Form ) and 2 and (
ok
Form ( 1 and 2 )
ok
Form ( 1 )
ok
Form ) and 2 )
ok
Form ( 1 and (
ok
Form ( whitespace (
or ) whitespace )
ok
# (?s)(?:\(((?!\s*\))(?&core))\)|\s*([()]))(?(DEFINE)(?<core>(?>(?&content)|\((?:(?!\s*\))(?&core))\)(?!\s*\())+)(?<content>(?>(?<=\))\s*(?:and|or)\s*(?=\()|(?<=\))\s*(?:(?:and|or)\s+\d+)+\s*(?:and|or)\s*(?=\()|(?<=\()\s*\d+(?:(?:\s+(?:and|or)\s+)?\d+)*\s*(?=\))|(?<=\))\s*(?:(?:and|or)\s+\d+)+\s*(?=\))|(?<=\()\s*(?:\d+\s+(?:and|or))+\s*(?=\()|\s+)))
# //////////////////////////////////////////////////////
# // The General Guide to 3-Part Recursive Parsing
# // ----------------------------------------------
# // Part 1. CONTENT
# // Part 2. CORE
# // Part 3. ERRORS
(?s) # Dot-All modifier (used in a previous incarnation)
(?:
# ( # (1), Take off CONTENT (not used here)
# (?&content)
# )
# | # OR
\( # Open Paren's
( # (1), parens CORE
(?! \s* \) ) # Empty form '( )' not allowed
(?&core)
)
\) # Close Paren's
| # OR
\s*
( # (2), Unbalanced (delimeter) ERRORS
# - Generally, on a whole parse, these
# are delimiter or content errors
[()]
)
)
# ///////////////////////
# // Subroutines
# // ---------------
(?(DEFINE)
# core
(?<core> # (3)
(?>
(?&content)
|
\( # Open Paren's
(?:
(?! \s* \) ) # Empty form '( )' not allowed
(?&core)
)
\) # Close Paren's
(?! \s* \( ) # Empty form ') (' not allowed
)+
)
# content
(?<content> # (4)
(?>
(?<= \) ) # Form ') and ('
\s*
(?: and | or )
\s*
(?= \( )
|
(?<= \) ) # Form ') and 2 and ('
\s*
(?:
(?: and | or )
\s+
\d+
)+
\s*
(?: and | or )
\s*
(?= \( )
|
(?<= \( ) # Form '( 1 and 2 )'
\s*
\d+
(?:
(?:
\s+
(?: and | or )
\s+
)?
\d+
)*
\s*
(?= \) )
|
(?<= \) ) # Form ') and 2 )'
\s*
(?:
(?: and | or )
\s+
\d+
)+
\s*
(?= \) )
|
(?<= \( ) # Form '( 1 and ('
\s*
(?:
\d+
\s+
(?: and | or )
)+
\s*
(?= \( )
|
\s+ # Interstitial whitespace
# '( here (' or ') here )'
)
)
)
Upvotes: 1
Reputation: 4251
Using pumping lemma it can be easily proven that the strings you want to validate belong to a language
that is not regular so it cannot be parsed using regular expressions. (Actually in the way of proving this The fact that you cannot match opening and closing parenthesis or even count them is used - as you mentioned in first part of your question) Although some regex engines may provide some additional functionalities that can be used for parsing this (like recursive patterns) but it's not 100% in accordance with the formal definition of regular expressions.
You may consider parsing the parenthesis yourself and validating the expression inside them using simple regular expressions, or you can use a parse tree, similar to what compilers do.
Upvotes: 2