Lionel Chan
Lionel Chan

Reputation: 8069

Converting conditional equation from infix to prefix notation

In our application we allow users to write specific conditions and we allow them express the conditions using such notation:

(1 and 2 and 3 or 4)

Where each numeric number correspond to one specific rule/condition. Now the problem is, how should I convert it, such that the end result is something like this:

{
    "$or": [
        "$and": [1, 2, 3],
        4
    ]
}

One more example:

(1 or 2 or 3 and 4)

To:

{
    "$or": [
        1,
        2,
        "$and": [3, 4]
    ]
}



I have written 50 over lines of tokenizer that successfully tokenized the statement into tokens and validated using stack/peek algorithm, and the tokens looks like this:

["(", "1", "and", "2", "and", "3", "or", "4", ")"]

And now how should I convert this kind of "infix notation" into "prefix notation" with the rule that and takes precedence over or?

Some pointers or keywords are greatly appreciated! What I have now doesn't really lead me to what I needed at the moment.

Some researches so far:

EDIT

Also, user has the ability to specify any number of parentheses if they insist, such as like:

((1 or 3) and (2 or 4) or 5)

So it get translates to:

{
    "$or": [{
        $and": [
            "$or": [1, 3],
            "$or": [2, 4]
        },
        5
    ]
}

EDIT 2

I figured out the algorithm. Posted as an answer below. Thanks for helping!

Upvotes: 3

Views: 2066

Answers (3)

Lionel Chan
Lionel Chan

Reputation: 8069

Thanks for the guides guys, at least I came out with my own solution. Since this is my first time doing mathematical equation parsing, pardon me if I did it wrongly or inefficient, or help me spot the error:

Basically, here are the steps I made it happen:

  1. Before parsing, always validate the pattern. Throw error when something is wrong.
  2. Once validated, we do a infix notation to prefix notation conversion. This step requires "and" takes precedence over "or".
    1. Reverse the given pattern
    2. Do infix to postfix notation conversion. I dumb, I learn from this
    3. Do the reverse again
    4. The infix to prefix should be done at this stage
  3. Build a tree from the prefix notation such that
    1. A node always have, and maximum, two branch
    2. Traverse down until it reach full leaves
  4. Optimize the tree such that it merges similar operators together (such as multiple $and operators with child $and can be merged and form a shorter tree)
  5. Mix with the given criteria set, and all done!!

Working example can be found here: http://jsfiddle.net/chaoszcat/uGKYj/3/

Working code as below:

(function() {

    /**
     * This is a source example of my original question on
     * http://stackoverflow.com/questions/20986255/converting-conditional-equation-from-infix-to-prefix-notation
     * 
     * This is my solution and use it at your own risk
     * @author Lionel Chan <chaoszcat[at]gmail.com>
     */

    /**
     * isNumeric, from jQuery. Duplicated here to make this js code pure
     * @param {mix} n Test subject
     * @returns {boolean} true if it's numeric
     */
    function isNumeric(n) {
        return !isNaN(parseFloat(n))&&isFinite(n);
    }

    /**
     * Node class - represent a operator or numeric node
     * @param {string} token The token string, operator "and", "or", or numeric value
     */
    function Node(token) {
        this.parent = null;
        this.children = []; //one node has two children at most
        this.token = token;
        this.is_operator = token === 'and' || token === 'or';
        this.is_numeric = !this.is_operator;
        this.destroyed = false;
    }

    Node.prototype = {

        isOperator: function() { return this.is_operator;},
        isNumeric: function() { return this.is_numeric;},

        //While building tree, a node is full if there are two children
        isFull: function() {
            return this.children.length >= 2;
        },

        addChild: function(node) {
            node.parent = this;
            this.children.push(node);
        },

        hasParent: function() {
            return this.parent !== null;
        },

        indexOfChild: function(node) {
            for (var i = 0 ; i < this.children.length ; ++i) {
                if (this.children[i] === node) {
                    return i;
                }
            }
            return -1;
        },

        removeChild: function(node) {
            var idx = this.indexOfChild(node);
            if (idx >= 0) {
                this.children[idx].parent = null; //remove parent relationship
                this.children.splice(idx, 1); //splice it out
            }
        },

        /**
         * Pass my children to the target node, and destroy myself
         * 
         * @param {Node} node A target node
         */
        passChildrenTo: function(node) {
            for (var i = 0 ; i < this.children.length ; ++i) {
                node.addChild(this.children[i]);
            }
            this.destroy();
        },

        //Destroy this node
        destroy: function() {
            this.parent.removeChild(this);
            this.children = null;
            this.destroyed = true;
        }
    };

    /**
     * Tree class - node manipulation
     * @param {array} prefixTokens The converted, prefix-notated tokens
     */
    function Tree(prefixTokens) {
        this.buildTree(prefixTokens);
        //Optimize tree - so that the tree will merge multiple similar operators together
        this.optimize(this.root);
    }

    Tree.prototype = {
        root: null,

        //Reference to the deepest operator node in the tree for next attachment point
        deepestNode: null,

        /**
         * Render this tree with given criteria array
         * @param {array} crits
         * @returns {object} The built criteria
         */
        render: function(crits) {
            //After optimization, we build the criteria and that's all!
            return this.buildCriteria(this.root, crits);
        },

        /**
         * Build criteria from root node. Recursive
         * 
         * @param {Node} node
         * @param {array} crits
         * @returns {object} of criteria
         */
        buildCriteria: function(node, crits) {

            var output = {},
                label = '$'+node.token;

            output[label] = []; //cpnditions array

            for (var i = 0 ; i < node.children.length ; ++i) {
                if (node.children[i].isOperator()) {
                    output[label].push(this.buildCriteria(node.children[i], crits));
                }else{
                    output[label].push(crits[node.children[i].token-1]);
                }
            }
            return output;
        },

        /**
         * Optimize the tree, we can simplify nodes with same operator. Recursive
         * 
         * @param {Node} node
         * @void
         */
        optimize: function(node) {

            //note that node.children.length will keep changing since the swapping children will occur midway. Rescan is required
            for (var i = 0 ; i < node.children.length ; ++i) {
                if (node.children[i].isOperator()) {
                    this.optimize(node.children[i]);
                    if (node.children[i].token === node.token) {
                        node.children[i].passChildrenTo(node);
                        i = 0; //rescan this level whenever a swap occured
                    }
                }
            }
        },

        /**
         * Build tree from raw tokens
         * @param {array} tokens
         */
        buildTree: function(tokens) {
            for (var i = 0 ; i < tokens.length ; ++i) {
                this.addNode(new Node(tokens[i]));
            }
        },

        /**
         * Add node into tree
         * 
         * @param {Node} node
         */
        addNode: function(node) {

            //If no root? The first node is root
            if (this.root === null) {
                this.root = node;
                this.deepestNode = node;
                return;
            }

            //if deepestNode is full, traverse up until we find a node with capacity
            while(this.deepestNode && this.deepestNode.isFull()) {
                this.deepestNode = this.deepestNode.parent;
            }

            if (this.deepestNode) {
                this.deepestNode.addChild(node);
            }

            //If the current node is an operator, we move the deepestNode cursor to it
            if (node.isOperator()) {
                this.deepestNode = node;
            }
        }
    };

    /**
     * Main criteria parser
     */
    var CriteriaParser = {

        /**
         * Convert raw string of pattern (1 and 2 or 3) into the object of criteria pattern
         * 
         * @param {string} str The raw pattern
         * @param {array} crits The raw list of criteria
         * @returns {String|Boolean}
         */
        parse: function(str, crits) {
            var tokens = this.tokenize(str),
                validationResult = this.validate(tokens, crits),
                prefixNotation = '';

            //Once succeded, we proceed to convert it to prefix notation
            if (validationResult === true) {
                prefixNotation = this.infixToPrefix(tokens);
                return (new Tree(prefixNotation)).render(crits);
            }else{
                return validationResult;
            }
        },

        /**
         * Convert the infix notation of the pattern (1 and 2 or 3) into prefix notation "or and 1 2 3"
         * 
         * Note:
         * - and has higher precedence than or
         * 
         * Steps:
         * 1. Reverse the tokens array
         * 2. Do infix -> postfix conversion (http://www.cs.arizona.edu/classes/cs227/spring12/infix.pdf, http://scriptasylum.com/tutorials/infix_postfix/algorithms/infix-postfix/index.htm)
         * 3. Reverse the result
         * 
         * @param {array} tokens The tokenized tokens
         * @returns {array} prefix notation of pattern
         */
        infixToPrefix: function(tokens) {

            var reversedTokens = tokens.slice(0).reverse(), //slice to clone, so not to disturb the original array
                stack = [],
                output = [];

            //And since it's reversed, please regard "(" as closing bracket, and ")" as opening bracket
            do {
                var stackTop = stack.length > 0 ? stack[stack.length-1] : null,
                    token = reversedTokens.shift();

                if (token === 'and') {
                    while(stackTop === 'and') {
                        output.push(stack.pop());
                        stackTop = stack.length > 0 ? stack[stack.length-1] : null;
                    }
                    stack.push(token);
                    stackTop = token;
                }else if (token === 'or') {
                    while(stackTop === 'and' || stackTop === 'or') { //and has higher precedence, so it will be popped out
                        output.push(stack.pop());
                        stackTop = stack.length > 0 ? stack[stack.length-1] : null;
                    }
                    stack.push(token);
                    stackTop = token;
                }else if (token === '(') { //'(' is closing bracket in reversed tokens
                    while(stackTop !== ')' && stackTop !== undefined) { //keep looping until found a "open - )" bracket
                        output.push(stack.pop());
                        stackTop = stack.length > 0 ? stack[stack.length-1] : null;
                    }
                    stack.pop(); //remove the open ")" bracket
                    stackTop = stack.length > 0 ? stack[stack.length-1] : null;
                }else if (token === ')') { //')' is opening bracket in reversed tokens
                    stack.push(token);
                }else if (isNumeric(token)) {
                    output.push(token);
                }else if (token === undefined) {
                    // no more tokens. Just shift everything out from stack
                    while(stack.length) {
                        stackTop = stack.pop();

                        if (stackTop !== undefined && stackTop !== ')') {
                            output.push(stackTop);
                        }
                    }
                }

            }while(stack.length || reversedTokens.length);

            //Reverse output and we are done
            return output.reverse();
        },

        /**
         * Tokenized the provided pattern
         * @param {string} str The raw pattern from user
         * @returns {array} A tokenized array
         */
        tokenize: function(str) {
            var pattern = str.replace(/\s/g, ''), //remove all the spaces :) not needed
                tokens = pattern.split(''),
                tokenized = [];

            //Tokenize it and verify
            var token = null,
                next = null;

            //attempts to concatenate the "and" and "or" and numerics
            while (tokens.length > 0) {
                token = tokens.shift();
                next = tokens.length > 0 ? tokens[0] : null;

                if (token === '(' || token === ')') {
                    tokenized.push(token);
                }else if (token === 'a' && tokens.length >= 2 && tokens[0] === 'n' && tokens[1] === 'd') { //and
                    tokenized.push(token + tokens.shift() + tokens.shift());
                }else if (token === 'o' && tokens.length >= 1 && next === 'r') { //or
                    tokenized.push(token + tokens.shift());
                }else if (isNumeric(token)) {
                    while(isNumeric(next)) {
                        token += next;
                        tokens.shift(); //exhaust it
                        next = tokens.length > 0 ? tokens[0] : null;
                    }
                    tokenized.push(token);
                }else{
                    tokenized.push(token);
                }
            }

            return tokenized;
        },

        /**
         * Attempt to validate tokenized tokens
         * 
         * @param {array} tokens The tokenized tokens
         * @param {array} crits The user provided criteria set
         * @returns {Boolean|String} Returns boolean true if succeeded, string if error occured
         */
        validate: function(tokens, crits) {

            var valid = true,
                token = null,
                stack = [],
                nextToken = null,
                criteria_count = crits.length;

            for (var i = 0 ; i < tokens.length ; ++i) {

                token = tokens[i];
                nextToken = i < tokens.length - 1 ? tokens[i+1] : null;

                if (token === '(') {
                    stack.push('(');
                    if (!isNumeric(nextToken) && nextToken !== '(' && nextToken !== ')') {
                        throw 'Unexpected token "'+nextToken+'"';
                    }
                }else if (token === ')') {
                    if (stack.length > 0) {
                        stack.pop();
                    }else{
                        throw 'Unexpected closing bracket';
                    }
                    if (nextToken !== ')' && nextToken !== 'and' && nextToken !== 'or' && nextToken !== null) {
                        throw 'Unexpected token "'+nextToken+'"';
                    }
                }else if (token === 'and' || token === 'or') {
                    if (!isNumeric(nextToken) && nextToken !== '(') {
                        throw 'Unexpected token "'+nextToken+'"';
                    }
                }else if (isNumeric(token) && token <= criteria_count) {
                    if (nextToken !== ')' && nextToken !== 'and' && nextToken !== 'or') {
                        throw 'Unexpected token "'+nextToken+'"';
                    }
                }else{
                    //anything not recognized, die.
                    throw 'Unexpected token "'+token+'"';
                }
            }

            //Last step - check if we have all brackets closed
            if (valid && stack.length > 0) {
                throw 'Missing '+stack.length+' closing bracket';
            }

            return valid;
        }
    };

    //This is an example pattern and criteria set. Note that pattern numbers must match criteria numbers.
    var pattern = '((1 or 3) and (2 or 4) or 5)',
        crits = [
            1, 2, 3, 4, 5
        ];

    //lazy on the document on load. Just delay
    setTimeout(function() {

        var result;
        try {
            result = JSON.stringify(CriteriaParser.parse(pattern, crits), undefined, 4);
        }catch(e) {
            result = e;
        }

        var pre = document.createElement('pre');
        pre.innerHTML = result;
        document.body.appendChild(pre); 
    }, 10);

})();

Upvotes: 2

Łukasz Kidziński
Łukasz Kidziński

Reputation: 1623

First define semantics. In your first example you gave (1 and 2 and 3) or 4 interpretation but it can also be 1 and 2 and (3 or 4) so:

{
    "$and": [
        {"$or": [3,4] },
        [1,2]
    ]
}

Let's assume that and has higher priority. Then just go through list join all terms with and. Next, join all the rest with or.

Upvotes: 0

Thue
Thue

Reputation: 153

This is most easily done using a two step process. 1) Convert to syntax tree. 2) Convert syntax tree to prefix notation.

A syntax tree is basically the same as your prefix notation, just built using the data structures of your programming language.

The standard method to create a syntax tree is to use a LALR parser generator, which is available for most languages. LALR parsers are fast, powerful, and expressive. A LALR parser generator takes a .y file as input, and outputs a source code file for a parser in the programming language of your choice. So you run the LALR parser generator once to generate your parser.

(All programmers should use learn to use parser generators :). It is also smart to use a standard tokenizer, while I am guessing you have written your own :).)

The following is a .y-file to generate a LALR-parser for your mini-language. Running this .y file though a LALR parser generator will output the source for a LALR parser, which takes tokens as input and outputs a parse-tree (in the variable $root_tree). You need to have defined the parsetree_binaryop datastructure manually elsewhere.

%left AND.
%left OR.
start ::= expr(e). { $root_tree = e; }
expr(r) ::= expr(e1) AND expr(e2). { r = new parsetree_binaryop(e1, OP_AND, e2); }
expr(r) ::= expr(e1) OR expr(e2). { r = new parsetree_binaryop(e1, OP_OR, e2); }
expr(r) ::= LPAR expr(e) RPAR. { r = e; }

The "%left AND" means that AND is left-associative (we could have chosen right too, doesn't matter for AND and OR). That "%left AND" is mentioned before "%left OR" means that AND binds tighter than OR, and the generated parser will therefore do the right thing.

When you have the syntax tree the parser gives you, generating the text representation is easy.

Edit: this seems to be a LALR parser generator which outputs a parser in JavaScript: http://sourceforge.net/projects/jscc/

Upvotes: 1

Related Questions