Lyubomir Dimov
Lyubomir Dimov

Reputation: 141

boolean prop matching

I have certain logical proposition and I want to check their validity with Regex in C#.

Each capital letter is a predicate. Formulas for predicate logic are built with predicates and connectives like ¬, ⇒, ⇔, ⋀ and ⋁. However user input should be in ASCII string notation, namely:

Logical notation       ASCII               
¬A                     ~(A)                 Negation

A ⇒ B                 >(A,B)               Implication

A ⇔ B                 =(A,B)               Bi-Implication

A ⋀ B                  &(A,B)               AND

A ⋁ B                  |(A,B)               OR

Furthermore True and False are represented with 0 and 1, like so : &(0,1)

Let's say I have following ASCII input

string input1 = "&(&(=(A,B),>(&(A,B),~(C))),>(A,~(&(A,B))))"; // valid
string input2 = "1"     // valid
string input3 = "=(~(A),>(>(B,C),~(A)))" // valid
string input4 = "(~(A))" // invalid because no connective in the beginning
string input5 = ">(A,B"  // invalid because no closing parenthesis

So, the ascii string should be either

  1. Single Predicate - A-Z or 0-1
  2. string starting with connective and containing two propositions separated by comma, those propositions could be either a single predicate or a connective with two propositions...

I came up with this:

Regex checkExpression = new Regex(
                   @"([&|>=]\(([A-Z0-1]{1}|\(.*\)),([A-Z0-1]{1}|\(.*\))\))
                                          |([~]\(([A-Z0-1]{1}|\(.*\))\))");

However, I am not very familiar with building Regular expressions, any help is appreciated.

Upvotes: 3

Views: 503

Answers (5)

Alex Collins
Alex Collins

Reputation: 570

I created a regex that can do this. The regex: ^([01A-Z](?![01A-Z])|(?<dyadic>[|&>=]\()|(?<comma-dyadic>\,)|(?<dBracket-comma>\))|(?<unary>~\()|(?<uBracket-unary>\)))+(?(dyadic)(?!))(?(comma)(?!))(?(unary)(?!))$

It is much cleaner and better in PCRE because you can do recursion ^([A-Z01])|([>=&|])\((?R),(?R)\)|~\((?R)\)$ but that is not available in C#'s flavor of regex.

I had to learn the balancing group from C# so you may need to look into that.

A breakdown of how the code works:

^                             # Start of line
(                             # Either
    [01A-Z](?![01A-Z])|       # A symbol or bool followed by anything else
    (?<dyadic>[|&>=]\((?!,))| # Start of dyadic
    (?<comma-dyadic>,(?!\)))| # Looks for comma followed by dyadic. Pops off the dyadic stack.
    (?<dBracket-comma>\))|    # Looks for ending bracket following comma. pops off comma stack. 
    (?<monadic>~\((?!\)))|    # Start of monadic function.
    (?<uBracket-monadic>\)))  # Looks for ending bracket for unary. Pops off the monadic stack. 
+                             # Any number of times.
(?(dyadic)(?!))               # Assert dyadic stack is empty. All have a comma.
(?(comma)(?!))                # Assert comma stack is empty. All dyadic commas followed by brackets.
(?(monadic)(?!))              # Assert monadic stack is empty. All monadic expressions have closing brackets.
$                             # End of line.

An example demo.

Update: Forgot to ensure there was a parameter in each function. A negative lookahead was added in 3 locations to fix this.

Update2: Made the regex only match single letter literals. Added a negative lookahead that checks if the letter or number is followed by a letter or number.

Upvotes: 2

SamWhan
SamWhan

Reputation: 8332

If I understand you correctly, you want to verify the syntax of a given proposition.

This could easily be done by looping and contracting each valid formula to a single predicate, say 1. Repeating this until the remainder is a single 1 will signal a valid proposition. Ending with No Match signals invalid proposition.

An illustration:

&(&(=(A,B),>(&(A,B),~(C))),>(A,~(&(A,B))))           Proposition
&(&(1,>(1,1)),>(A,~(1)))                             First iteration
&(&(1,1),>(A,1))                                     Second iteration
&(1,1)                                               Third iteration
1                                                    Fourth iteration

=(~(A),>(>(B,C),~(A)))                               Proposition
=(1,>(1,1))                                          First iteration
=(1,1)                                               Second iteration
1                                                    Third iteration

(~(A))                                               Proposition
(1)                                                  First iteration
            No Match

>(A,B                                                Proposition
            No Match

(Generated with regex101)

Your regex work, but I've simplified it slightly:

~\([A-Z0-1]\)|[&|>=]\([A-Z0-1],[A-Z0-1]\)

Here's a live demo at ideone.

Upvotes: 1

Nico
Nico

Reputation: 12683

As Richard has stated you should be using ASTs to manage the validation and actually you can also use this to start building your own language on C#. I have done this many times in the past for various projects and have used a pretty decent tool called "Irony.Net" where you design your grammar in code directly.

Irony is a development kit for implementing languages on .NET platform. Unlike most existing yacc/lex-style solutions Irony does not employ any scanner or parser code generation from grammar specifications written in a specialized meta-language. In Irony the target language grammar is coded directly in c# using operator overloading to express grammar constructs. Irony's scanner and parser modules use the grammar encoded as c# class to control the parsing process. Irony.Net CodePlex

With this have come up with a pretty basic grammar that seems to handle your cases below. However there is an odd case in your examples (or require further explanation)

  1. 1 is valid but is 0 valid?
  2. As above what about any of the predicates A-Z (upper case)?

Example grammar

[Language("Logical Proposition", "1.0", "")]
public class LogicalPropositionGrammar : Grammar
{
    public LogicalPropositionGrammar()
    {
        //syntax terminals
        var lpar = ToTerm("(");
        var rpar = ToTerm(")");
        var comma = ToTerm(",");
        var trueTerm = ToTerm("1") | "true";
        var falseTerm = ToTerm("0") | "false";

        //nonterms
        var predicate = new NonTerminal("Predicate");
        var connective = new NonTerminal("Connective");
        var pexp = new NonTerminal("PredExpression");
        var formula = new NonTerminal("Formula");
        var literal = new NonTerminal("Literal");
        var singleTerm = new NonTerminal("SingleTerm");
        var multiTerm = new NonTerminal("MultiTerm");

        //formulat non terms
        var negation = new NonTerminal("Negation");
        var implication = new NonTerminal("Implication");
        var biImplication = new NonTerminal("Bi-Implication");
        var andTerm = new NonTerminal("And");
        var orTerm = new NonTerminal("Or");

        literal.Rule = trueTerm | falseTerm;
        singleTerm.Rule = lpar + pexp + rpar; //single term is (pexp)
        multiTerm.Rule = lpar + pexp + comma + pexp + rpar; //mult term = (pexp, pexp)

        //formula rules
        negation.Rule = ToTerm("~") + singleTerm;
        implication.Rule = ToTerm(">") + multiTerm;
        biImplication.Rule = ToTerm("=") + multiTerm;
        andTerm.Rule = ToTerm("&") + multiTerm;
        orTerm.Rule = ToTerm("|") + multiTerm;

        //predicate terms
        predicate.Rule = ToTerm("A") | "B" | "C" | "D" | "E" | "F" | "G" |
                            "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" |
                            "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" |
                            "X" | "Y" | "Z" | literal;

        //predicate rule
        pexp.Rule = predicate | negation | implication | biImplication | andTerm | orTerm;

        //a base formulat
        formula.Rule = MakeStarRule(formula, pexp);

        RegisterOperators(10, "&", "~", ">", "=", "|");
        MarkPunctuation(",", "(", ")");
        MarkTransient(pexp, singleTerm);

        Root = formula;
    }
}

Example 1 parser

Upvotes: 4

Anton&#237;n Lejsek
Anton&#237;n Lejsek

Reputation: 6103

As has been said, regex is not good tool for this. Even if you could make it (I rather doubt that in this case), often you not only need to validate it, but also evaluate. And you really do not want to do that with regex. And while for regex approach this is a nightmare, for gramatics parser this is a piece of cake. If you need something lightweight, for simple cases you can even write it all on your own, LL(1) analyzers have quite descriptive code:

public class ParseException : Exception
{
    public ParseException(string message) : base(message) { }
}

public class Analyzer
{
    protected int position;
    protected string input;
    protected Dictionary<char, bool> predicates;

    public Analyzer(string input)
    {
        this.input = input;
    }

    public bool? Evaluate(Dictionary<char, bool> predicates = null)
    {
        position = 0;
        this.predicates = predicates;

        try
        {
            bool value = T();
            if (position == input.Length)
            {
                return value;
            }
        }
        catch (ParseException)
        {
        }

        return null;
    }

    protected char GetChar()
    {
        if (position >= input.Length)
        {
            throw new ParseException("Unexpected end of input");
        }
        return input[position++];
    }

    protected void MatchChar(char c)
    {
        if (GetChar() != c)
        {
            throw new ParseException("Invalid input");
        }
    }

    protected bool T()
    {
        char c = GetChar();
        if (c == '~')
        {
            MatchChar('(');
            bool val = T();
            MatchChar(')');
            return !val;
        }
        if (c == '>')
        {
            MatchChar('(');
            bool val1 = T();
            MatchChar(',');
            bool val2 = T();
            MatchChar(')');
            return val2 || !val1;
        }
        if (c == '=')
        {
            MatchChar('(');
            bool val1 = T();
            MatchChar(',');
            bool val2 = T();
            MatchChar(')');
            return val1 == val2;
        }
        if (c == '&')
        {
            MatchChar('(');
            bool val1 = T();
            MatchChar(',');
            bool val2 = T();
            MatchChar(')');
            return val1 && val2;
        }
        if (c == '|')
        {
            MatchChar('(');
            bool val1 = T();
            MatchChar(',');
            bool val2 = T();
            MatchChar(')');
            return val1 || val2;
        }
        if (c == '0')
        {
            return false;
        }
        if (c == '1')
        {
            return true;
        }
        if (c >= 'A' && c <= 'Z')
        {                   
            if (predicates == null) { return false; }
            if (predicates.TryGetValue(c, out bool val))
            {
                return val;
            }
            throw new ParseException("Predicate value not found");
        }

        throw new ParseException("Invalid input");
    }
}

You can test validity like this:

bool ok1 = new Analyzer(input1).Evaluate().HasValue;

And evaluate like this:

var values1 = new Dictionary<char, bool>() { ['A'] = true, ['B'] = false, ['C'] = true };
bool result1 = new Analyzer(input1).Evaluate(values1).Value;

Upvotes: 1

Richard Schneider
Richard Schneider

Reputation: 35477

Using regex for language parsing is possible but becomes very complex very fast.

I suggest using Abstract Syntax Trees (AST). I like ANTLR. A good intro can be found at ANTLR with C# – using an Abstract Syntax Tree (AST)

Upvotes: 2

Related Questions