user1796078
user1796078

Reputation: 61

Need to create regular expression in Javascript to check the valid conditional string

I want to create the regular expression in javascript which will check the valid conditional string like

-1 OR (1 AND 2) AND 1

-1 OR (1 AND 2)

-1 OR 2

-1 OR 1 OR 1

-1 AND 1 AND 1

The string should not contain 'AND' and 'OR'. For example - 1 OR 2 AND 3 is invalid. -It should be (1 OR 2) AND 3 or 1 or (2 AND 3).

I tried the following Regex. It works for most of the conditions but it is failing to check the above condition.

/^(\s*\(\d+\s(AND|OR)\s\d+\)|\s*\d+)((\s*(AND|OR)\s*)(\(\d+\s(AND|OR)\s\d+\)|\s*\d+))*$/

Can anyone please help me to sort out the above problem.

Upvotes: 6

Views: 725

Answers (2)

ridgerunner
ridgerunner

Reputation: 34395

Interesting question. And phant0m's answer is very educational! (and should be used if you understand parsers).

If you want to do this using just regex, the following solution correctly validates an arbitrarily nested logical statement using JavaScript.

Rules/Assumptions:

  • A valid statement consists of only numbers, parentheses, spaces, the AND logical operator and the OR logical operator.
  • The statement must contain at least two "tokens", separated by a logical operator, where each token is either a "number" or a "parenthesized unit".
  • A "number" token is a numeric integer having one or more decimal digits immediately preceded with an optional sign (either + or -).
  • A "parenthesized unit" token is two or more tokens, separated by a logical operator, enclosed within matching open and close parentheses.
  • The statement as a whole may contain more than two tokens but all tokens must be separated by the same, single operator; either AND or OR.
  • Each parenthesized unit may contain more than two tokens but all tokens must be separated by the same, single operator; either AND or OR.
  • Any amount of whitespace may be used between any of the elements (parentheses, numbers and logical operators) but at least one space is required between the numbers and the logical operator.
  • The AND and OR logical operators are not case sensitive.

Examples of valid logical statements:

"1 AND 2"
"1 AND 2 AND 3"
"1 OR 2"
"-10 AND -20"
"100 AND +200 AND -300"
"1 AND (2 OR 3)"
"1 AND (2 OR 3) AND 4"
"1 OR ((2 AND 3 AND 4) OR (5 OR 6 OR 7))"
"( 1 and 2 ) AND (1 AND 2)"

Examples of invalid logical statements:

"1x"            // Invalid character.
"1 AND"         // Missing token.
"1 AND 2 OR 3"  // Mixed logical operators.
"(1"            // Unbalanced parens.
"(((1 AND 2)))" // Too many parens.
"(1 AND) (2)"   // Missing token.
"1"             // Missing logical operator and second number
"1OR2OR3OR4"    // Missing spaces between numbers and operators.
"(1) AND (2)"   // Invalid parentheses.

A Regex Solution:

This problem requires matching nested parenthesized structures and the JavaScript regex engine does not support recursive expressions, therefore this problem cannot be solved in one whack using a single regex. However, the problem can be simplified into two parts each of which can be solved using a single JavaScript regex. The first regex matches innermost parenthesized units and the second validates a simplified logical statement (which has no parentheses).

Regex #1: Match innermost parenthesized unit.

The following regex matches one parenthesized unit, which consists of two or more number tokens, where numbers are all separated by either AND or OR with at least one space between the numbers and the logical operators. The regex is fully commented and formatted for easy readability in PHP free-spacing mode syntax:

$re_paren = '/
    # Match innermost "parenthesized unit".
    \(            # Start of innermost paren group.
    \s*           # Optional whitespace.
    [+-]?\d+      # First number token (required).
    (?:           # ANDs or ORs (required).
      (?:         # Either multiple AND separated values.
        \s+       # Required whitespace.
        AND       # Logical operator.
        \s+       # Required whitespace.
        [+-]?\d+  # Additional number.
      )+          # multiple AND separated values.
    | (?:         # Or multiple OR separated values.
        \s+       # Required whitespace.
        OR        # Logical operator.
        \s+       # Required whitespace.
        [+-]?\d+  # Additional number token.
      )+          # multiple OR separated values.
    )             # ANDs or ORs (required).
    \s*           # Optional whitespace.
    \)            # End of innermost paren group.
    /ix';

Regex #2: Validate simplified logical statement.

Here is a (nearly identical except for the boundary anchors) regular expression which validates a simplified logical statement (having only numbers and logical operators and no parentheses). Here it is in commented, free-spacing mode (PHP) syntax:

$re_valid = '/
    # Validate simple logical statement (no parens).
    ^             # Anchor to start of string.
    \s*           # Optional whitespace.
    [+-]?\d+      # First number token (required).
    (?:           # ANDs or ORs (required).
      (?:         # Either multiple AND separated values.
        \s+       # Required whitespace.
        AND       # Logical operator.
        \s+       # Required whitespace.
        [+-]?\d+  # Additional number.
      )+          # multiple AND separated values.
    | (?:         # Or multiple OR separated values.
        \s+       # Required whitespace.
        OR        # Logical operator.
        \s+       # Required whitespace.
        [+-]?\d+  # Additional number token.
      )+          # multiple OR separated values.
    )             # ANDs or ORs (required).
    \s*           # Optional whitespace.
    $             # Anchor to end of string.
    /ix';

Note that these two regexes are identical except for the boundary anchors.

A JavaScript Solution:

The following tested JavaScript function uses the above two regexes to solve the problem:

function isValidLogicalStatement(text) {
    var re_paren = /\(\s*[+-]?\d+(?:(?:\s+AND\s+[+-]?\d+)+|(?:\s+OR\s+[+-]?\d+)+)\s*\)/ig;
    var re_valid =  /^\s*[+-]?\d+(?:(?:\s+AND\s+[+-]?\d+)+|(?:\s+OR\s+[+-]?\d+)+)\s*$/ig;
    // Iterate from the inside out.
    while (text.search(re_paren) !== -1) {
        // Replace innermost parenthesized units with integer.
        text = text.replace(re_paren, "0");
    }
    if (text.search(re_valid) === 0) return true;
    return false;
}

The function uses an iterative technique to first match and replace the innermost parenthesized units, replacing each with a single number token, then check to see if the resulting statement (sans parentheses), is valid.

Addendum: 2012-11-06

In a comment to this answer, the OP now says that there must be spaces between the numbers and operators and that a number or parenthesized unit may NOT stand on its own. With these additional requirements in mind, I've updated the answer above.

Upvotes: 4

phant0m
phant0m

Reputation: 16905

Forget about regular expressions, they can't do it.

Parser generators to the rescue

With a parser generator, you can create grammar that is both understandable and maintainable.

Here is a parser generator for JavaScript with an online demo.

Grammar

From what I have understood, you want don't want any implicit precedence rules between AND and OR.

Here is an example of what it considers valid:

-1 OR 2 OR (2 AND 2 AND (2 OR (6 AND -2 AND (6 OR 2) AND (6 OR 2)) OR 2 OR 2))

At the moment, the grammar requires/supports

  • "infinite" nesting
  • explicit precedence control with parentheses for AND/OR
  • (multiple) negation of literals
  • whitespace between operands and operators

The grammar can easily be changed to

  • allow arbitrary whitespace
  • optional negation of literals instead of possible multiple negations
  • negation of any subexpression

If you want a more in-depth explanation or can't figure out how to tweak it exactly to your liking, just drop a comment.

Here is your grammar, Simply paste it into the online generator and click Download parser.

start
  = formula

formula
 = ors
 / ands
 / literal
 / parens_formula

parens_formula
 = "(" formula ")"

ors
 = operand (whitespace "OR" whitespace  operand)+

ands
 =  operand (whitespace "AND" whitespace operand)+

whitespace
 = " "+

operand
 = literal
 / parens_formula

literal
 = integer
 / "-" literal

integer "integer"
  = digits:[0-9]+ { return parseInt(digits.join(""), 10); }

Upvotes: 5

Related Questions