mathem8tic
mathem8tic

Reputation: 41

Regular expression for validating math equation

I'm trying to build a regex that validates a math equation. The equation itself is very simple, I'm looking to make an English readable equation, that I will later return as true or false. An example would be like so.

((1 and 2) or 3)

In this example, I will swap out any numbers with either true, or false. I will also replace, "and" with "&&" and "or" with "||" in order to run the equation using PHP. The response to this will ultimately be either true, or false.

An example final equation would look something like this:

((true && true) || true)

Here are some more examples of what should be considered valid.

(1 or 2 or 3)

((1 and 2 and 3) or (4 and 5))

So, my question comes in two parts.

  1. Is it possible to create a regex expression to validate all possible valid equations? One big road block for me is understanding how I could validate that all opening "(" also have the closing ")".
  2. Is it advisable to use a regex expression in order to validate client side in this circumstance? I am already able to validate the expression using AJAX and PHP, so am I just overthinking this?

Upvotes: 4

Views: 1485

Answers (2)

user557597
user557597

Reputation:

This can be done in php (uses PCRE engine).
Below is just an example.
You could comment out the errors check, then insert boundary constructs
around the regex to make it definitively pass/fail.

The biggest problem is not the recursion, but defining the content boundary
conditions. I've pretty much boiled it down for you. These checks have to
be maintained any how you do it, state, stacks ..., its all the same.

( This regex was constructed and tested using RegexFormat 6 )

Sample input:

 (((   (1 and 2 and 3) or (9) or ( ( 4 and 5)) and 5 ) and   7) )

Tested output:

 **  Grp 0 -  ( pos 0 , len 64 ) 
(((   (1 and 2 and 3) or (9) or ( ( 4 and 5)) and 5 ) and   7) )  
 **  Grp 1 -  ( pos 1 , len 62 ) 
((   (1 and 2 and 3) or (9) or ( ( 4 and 5)) and 5 ) and   7)   
 **  Grp 2 -  NULL 
 **  Grp 3 -  NULL 
 **  Grp 4 -  NULL 

Regex:
5/29 All Forms:

Empty form ( ) not allowed
Empty form ) ( not allowed
Form ) and ( ok
Form ) and 2 and ( ok
Form ( 1 and 2 ) ok
Form ( 1 ) ok
Form ) and 2 ) ok
Form ( 1 and ( ok
Form ( whitespace ( or ) whitespace ) ok

 # (?s)(?:\(((?!\s*\))(?&core))\)|\s*([()]))(?(DEFINE)(?<core>(?>(?&content)|\((?:(?!\s*\))(?&core))\)(?!\s*\())+)(?<content>(?>(?<=\))\s*(?:and|or)\s*(?=\()|(?<=\))\s*(?:(?:and|or)\s+\d+)+\s*(?:and|or)\s*(?=\()|(?<=\()\s*\d+(?:(?:\s+(?:and|or)\s+)?\d+)*\s*(?=\))|(?<=\))\s*(?:(?:and|or)\s+\d+)+\s*(?=\))|(?<=\()\s*(?:\d+\s+(?:and|or))+\s*(?=\()|\s+)))


 # //////////////////////////////////////////////////////
 # // The General Guide to 3-Part Recursive Parsing
 # // ----------------------------------------------
 # // Part 1. CONTENT
 # // Part 2. CORE
 # // Part 3. ERRORS

 (?s)                       # Dot-All modifier (used in a previous incarnation)

 (?:
      #           (                          # (1), Take off CONTENT (not used here)
      #                (?&content) 
      #           )
      #        |                           # OR

      \(                         # Open Paren's
      (                          # (1), parens CORE
           (?! \s* \) )               # Empty form '( )' not allowed
           (?&core) 
      )
      \)                         # Close Paren's
   |                           # OR
      \s* 
      (                          # (2), Unbalanced (delimeter) ERRORS
                                      # - Generally, on a whole parse, these
                                      #   are delimiter or content errors
           [()]                       
      )
 )

 # ///////////////////////
 # // Subroutines
 # // ---------------

 (?(DEFINE)

      # core
      (?<core>                   # (3)
           (?>
                (?&content) 
             |  
                \(                         # Open Paren's
                (?:
                     (?! \s* \) )               # Empty form '( )' not allowed
                     (?&core) 
                )
                \)                         # Close Paren's
                (?! \s* \( )               # Empty form ') (' not allowed
           )+
      )

      # content 
      (?<content>                # (4)
           (?>
                (?<= \) )                  # Form ') and ('
                \s* 
                (?: and | or )
                \s* 
                (?= \( )

             |  
                (?<= \) )                  # Form ') and 2 and ('
                \s* 
                (?:
                     (?: and | or )
                     \s+ 
                     \d+ 
                )+
                \s* 
                (?: and | or )
                \s* 
                (?= \( )

             |  
                (?<= \( )                  # Form '( 1 and 2 )'
                \s* 
                \d+ 
                (?:
                     (?:
                          \s+ 
                          (?: and | or )
                          \s+ 
                     )?
                     \d+ 
                )*
                \s* 
                (?= \) )

             |  
                (?<= \) )                  # Form ') and 2 )'
                \s* 
                (?:
                     (?: and | or )
                     \s+ 
                     \d+ 
                )+
                \s* 
                (?= \) )

             |  
                (?<= \( )                  # Form '( 1 and ('
                \s* 
                (?:

                     \d+ 
                     \s+ 
                     (?: and | or )
                )+
                \s* 
                (?= \( )

             |  
                \s+                        # Interstitial whitespace
                                           # '( here (' or ') here )'
           )
      )

 )

Upvotes: 1

Mohammad Jafar Mashhadi
Mohammad Jafar Mashhadi

Reputation: 4251

Using pumping lemma it can be easily proven that the strings you want to validate belong to a language that is not regular so it cannot be parsed using regular expressions. (Actually in the way of proving this The fact that you cannot match opening and closing parenthesis or even count them is used - as you mentioned in first part of your question) Although some regex engines may provide some additional functionalities that can be used for parsing this (like recursive patterns) but it's not 100% in accordance with the formal definition of regular expressions.

You may consider parsing the parenthesis yourself and validating the expression inside them using simple regular expressions, or you can use a parse tree, similar to what compilers do.

Upvotes: 2

Related Questions