Reputation: 8069
In our application we allow users to write specific conditions and we allow them express the conditions using such notation:
(1 and 2 and 3 or 4)
Where each numeric number correspond to one specific rule/condition. Now the problem is, how should I convert it, such that the end result is something like this:
{
"$or": [
"$and": [1, 2, 3],
4
]
}
One more example:
(1 or 2 or 3 and 4)
To:
{
"$or": [
1,
2,
"$and": [3, 4]
]
}
I have written 50 over lines of tokenizer that successfully tokenized the statement into tokens and validated using stack/peek algorithm, and the tokens looks like this:
["(", "1", "and", "2", "and", "3", "or", "4", ")"]
And now how should I convert this kind of "infix notation" into "prefix notation" with the rule that and
takes precedence over or
?
Some pointers or keywords are greatly appreciated! What I have now doesn't really lead me to what I needed at the moment.
Some researches so far:
EDIT
Also, user has the ability to specify any number of parentheses if they insist, such as like:
((1 or 3) and (2 or 4) or 5)
So it get translates to:
{
"$or": [{
$and": [
"$or": [1, 3],
"$or": [2, 4]
},
5
]
}
EDIT 2
I figured out the algorithm. Posted as an answer below. Thanks for helping!
Upvotes: 3
Views: 2066
Reputation: 8069
Thanks for the guides guys, at least I came out with my own solution. Since this is my first time doing mathematical equation parsing, pardon me if I did it wrongly or inefficient, or help me spot the error:
Basically, here are the steps I made it happen:
$and
operators with child $and
can be merged and form a shorter tree)Working example can be found here: http://jsfiddle.net/chaoszcat/uGKYj/3/
Working code as below:
(function() {
/**
* This is a source example of my original question on
* http://stackoverflow.com/questions/20986255/converting-conditional-equation-from-infix-to-prefix-notation
*
* This is my solution and use it at your own risk
* @author Lionel Chan <chaoszcat[at]gmail.com>
*/
/**
* isNumeric, from jQuery. Duplicated here to make this js code pure
* @param {mix} n Test subject
* @returns {boolean} true if it's numeric
*/
function isNumeric(n) {
return !isNaN(parseFloat(n))&&isFinite(n);
}
/**
* Node class - represent a operator or numeric node
* @param {string} token The token string, operator "and", "or", or numeric value
*/
function Node(token) {
this.parent = null;
this.children = []; //one node has two children at most
this.token = token;
this.is_operator = token === 'and' || token === 'or';
this.is_numeric = !this.is_operator;
this.destroyed = false;
}
Node.prototype = {
isOperator: function() { return this.is_operator;},
isNumeric: function() { return this.is_numeric;},
//While building tree, a node is full if there are two children
isFull: function() {
return this.children.length >= 2;
},
addChild: function(node) {
node.parent = this;
this.children.push(node);
},
hasParent: function() {
return this.parent !== null;
},
indexOfChild: function(node) {
for (var i = 0 ; i < this.children.length ; ++i) {
if (this.children[i] === node) {
return i;
}
}
return -1;
},
removeChild: function(node) {
var idx = this.indexOfChild(node);
if (idx >= 0) {
this.children[idx].parent = null; //remove parent relationship
this.children.splice(idx, 1); //splice it out
}
},
/**
* Pass my children to the target node, and destroy myself
*
* @param {Node} node A target node
*/
passChildrenTo: function(node) {
for (var i = 0 ; i < this.children.length ; ++i) {
node.addChild(this.children[i]);
}
this.destroy();
},
//Destroy this node
destroy: function() {
this.parent.removeChild(this);
this.children = null;
this.destroyed = true;
}
};
/**
* Tree class - node manipulation
* @param {array} prefixTokens The converted, prefix-notated tokens
*/
function Tree(prefixTokens) {
this.buildTree(prefixTokens);
//Optimize tree - so that the tree will merge multiple similar operators together
this.optimize(this.root);
}
Tree.prototype = {
root: null,
//Reference to the deepest operator node in the tree for next attachment point
deepestNode: null,
/**
* Render this tree with given criteria array
* @param {array} crits
* @returns {object} The built criteria
*/
render: function(crits) {
//After optimization, we build the criteria and that's all!
return this.buildCriteria(this.root, crits);
},
/**
* Build criteria from root node. Recursive
*
* @param {Node} node
* @param {array} crits
* @returns {object} of criteria
*/
buildCriteria: function(node, crits) {
var output = {},
label = '$'+node.token;
output[label] = []; //cpnditions array
for (var i = 0 ; i < node.children.length ; ++i) {
if (node.children[i].isOperator()) {
output[label].push(this.buildCriteria(node.children[i], crits));
}else{
output[label].push(crits[node.children[i].token-1]);
}
}
return output;
},
/**
* Optimize the tree, we can simplify nodes with same operator. Recursive
*
* @param {Node} node
* @void
*/
optimize: function(node) {
//note that node.children.length will keep changing since the swapping children will occur midway. Rescan is required
for (var i = 0 ; i < node.children.length ; ++i) {
if (node.children[i].isOperator()) {
this.optimize(node.children[i]);
if (node.children[i].token === node.token) {
node.children[i].passChildrenTo(node);
i = 0; //rescan this level whenever a swap occured
}
}
}
},
/**
* Build tree from raw tokens
* @param {array} tokens
*/
buildTree: function(tokens) {
for (var i = 0 ; i < tokens.length ; ++i) {
this.addNode(new Node(tokens[i]));
}
},
/**
* Add node into tree
*
* @param {Node} node
*/
addNode: function(node) {
//If no root? The first node is root
if (this.root === null) {
this.root = node;
this.deepestNode = node;
return;
}
//if deepestNode is full, traverse up until we find a node with capacity
while(this.deepestNode && this.deepestNode.isFull()) {
this.deepestNode = this.deepestNode.parent;
}
if (this.deepestNode) {
this.deepestNode.addChild(node);
}
//If the current node is an operator, we move the deepestNode cursor to it
if (node.isOperator()) {
this.deepestNode = node;
}
}
};
/**
* Main criteria parser
*/
var CriteriaParser = {
/**
* Convert raw string of pattern (1 and 2 or 3) into the object of criteria pattern
*
* @param {string} str The raw pattern
* @param {array} crits The raw list of criteria
* @returns {String|Boolean}
*/
parse: function(str, crits) {
var tokens = this.tokenize(str),
validationResult = this.validate(tokens, crits),
prefixNotation = '';
//Once succeded, we proceed to convert it to prefix notation
if (validationResult === true) {
prefixNotation = this.infixToPrefix(tokens);
return (new Tree(prefixNotation)).render(crits);
}else{
return validationResult;
}
},
/**
* Convert the infix notation of the pattern (1 and 2 or 3) into prefix notation "or and 1 2 3"
*
* Note:
* - and has higher precedence than or
*
* Steps:
* 1. Reverse the tokens array
* 2. Do infix -> postfix conversion (http://www.cs.arizona.edu/classes/cs227/spring12/infix.pdf, http://scriptasylum.com/tutorials/infix_postfix/algorithms/infix-postfix/index.htm)
* 3. Reverse the result
*
* @param {array} tokens The tokenized tokens
* @returns {array} prefix notation of pattern
*/
infixToPrefix: function(tokens) {
var reversedTokens = tokens.slice(0).reverse(), //slice to clone, so not to disturb the original array
stack = [],
output = [];
//And since it's reversed, please regard "(" as closing bracket, and ")" as opening bracket
do {
var stackTop = stack.length > 0 ? stack[stack.length-1] : null,
token = reversedTokens.shift();
if (token === 'and') {
while(stackTop === 'and') {
output.push(stack.pop());
stackTop = stack.length > 0 ? stack[stack.length-1] : null;
}
stack.push(token);
stackTop = token;
}else if (token === 'or') {
while(stackTop === 'and' || stackTop === 'or') { //and has higher precedence, so it will be popped out
output.push(stack.pop());
stackTop = stack.length > 0 ? stack[stack.length-1] : null;
}
stack.push(token);
stackTop = token;
}else if (token === '(') { //'(' is closing bracket in reversed tokens
while(stackTop !== ')' && stackTop !== undefined) { //keep looping until found a "open - )" bracket
output.push(stack.pop());
stackTop = stack.length > 0 ? stack[stack.length-1] : null;
}
stack.pop(); //remove the open ")" bracket
stackTop = stack.length > 0 ? stack[stack.length-1] : null;
}else if (token === ')') { //')' is opening bracket in reversed tokens
stack.push(token);
}else if (isNumeric(token)) {
output.push(token);
}else if (token === undefined) {
// no more tokens. Just shift everything out from stack
while(stack.length) {
stackTop = stack.pop();
if (stackTop !== undefined && stackTop !== ')') {
output.push(stackTop);
}
}
}
}while(stack.length || reversedTokens.length);
//Reverse output and we are done
return output.reverse();
},
/**
* Tokenized the provided pattern
* @param {string} str The raw pattern from user
* @returns {array} A tokenized array
*/
tokenize: function(str) {
var pattern = str.replace(/\s/g, ''), //remove all the spaces :) not needed
tokens = pattern.split(''),
tokenized = [];
//Tokenize it and verify
var token = null,
next = null;
//attempts to concatenate the "and" and "or" and numerics
while (tokens.length > 0) {
token = tokens.shift();
next = tokens.length > 0 ? tokens[0] : null;
if (token === '(' || token === ')') {
tokenized.push(token);
}else if (token === 'a' && tokens.length >= 2 && tokens[0] === 'n' && tokens[1] === 'd') { //and
tokenized.push(token + tokens.shift() + tokens.shift());
}else if (token === 'o' && tokens.length >= 1 && next === 'r') { //or
tokenized.push(token + tokens.shift());
}else if (isNumeric(token)) {
while(isNumeric(next)) {
token += next;
tokens.shift(); //exhaust it
next = tokens.length > 0 ? tokens[0] : null;
}
tokenized.push(token);
}else{
tokenized.push(token);
}
}
return tokenized;
},
/**
* Attempt to validate tokenized tokens
*
* @param {array} tokens The tokenized tokens
* @param {array} crits The user provided criteria set
* @returns {Boolean|String} Returns boolean true if succeeded, string if error occured
*/
validate: function(tokens, crits) {
var valid = true,
token = null,
stack = [],
nextToken = null,
criteria_count = crits.length;
for (var i = 0 ; i < tokens.length ; ++i) {
token = tokens[i];
nextToken = i < tokens.length - 1 ? tokens[i+1] : null;
if (token === '(') {
stack.push('(');
if (!isNumeric(nextToken) && nextToken !== '(' && nextToken !== ')') {
throw 'Unexpected token "'+nextToken+'"';
}
}else if (token === ')') {
if (stack.length > 0) {
stack.pop();
}else{
throw 'Unexpected closing bracket';
}
if (nextToken !== ')' && nextToken !== 'and' && nextToken !== 'or' && nextToken !== null) {
throw 'Unexpected token "'+nextToken+'"';
}
}else if (token === 'and' || token === 'or') {
if (!isNumeric(nextToken) && nextToken !== '(') {
throw 'Unexpected token "'+nextToken+'"';
}
}else if (isNumeric(token) && token <= criteria_count) {
if (nextToken !== ')' && nextToken !== 'and' && nextToken !== 'or') {
throw 'Unexpected token "'+nextToken+'"';
}
}else{
//anything not recognized, die.
throw 'Unexpected token "'+token+'"';
}
}
//Last step - check if we have all brackets closed
if (valid && stack.length > 0) {
throw 'Missing '+stack.length+' closing bracket';
}
return valid;
}
};
//This is an example pattern and criteria set. Note that pattern numbers must match criteria numbers.
var pattern = '((1 or 3) and (2 or 4) or 5)',
crits = [
1, 2, 3, 4, 5
];
//lazy on the document on load. Just delay
setTimeout(function() {
var result;
try {
result = JSON.stringify(CriteriaParser.parse(pattern, crits), undefined, 4);
}catch(e) {
result = e;
}
var pre = document.createElement('pre');
pre.innerHTML = result;
document.body.appendChild(pre);
}, 10);
})();
Upvotes: 2
Reputation: 1623
First define semantics. In your first example you gave (1 and 2 and 3) or 4
interpretation but it can also be 1 and 2 and (3 or 4)
so:
{
"$and": [
{"$or": [3,4] },
[1,2]
]
}
Let's assume that and
has higher priority. Then just go through list join all terms with and
. Next, join all the rest with or
.
Upvotes: 0
Reputation: 153
This is most easily done using a two step process. 1) Convert to syntax tree. 2) Convert syntax tree to prefix notation.
A syntax tree is basically the same as your prefix notation, just built using the data structures of your programming language.
The standard method to create a syntax tree is to use a LALR parser generator, which is available for most languages. LALR parsers are fast, powerful, and expressive. A LALR parser generator takes a .y file as input, and outputs a source code file for a parser in the programming language of your choice. So you run the LALR parser generator once to generate your parser.
(All programmers should use learn to use parser generators :). It is also smart to use a standard tokenizer, while I am guessing you have written your own :).)
The following is a .y-file to generate a LALR-parser for your mini-language. Running this .y file though a LALR parser generator will output the source for a LALR parser, which takes tokens as input and outputs a parse-tree (in the variable $root_tree). You need to have defined the parsetree_binaryop datastructure manually elsewhere.
%left AND.
%left OR.
start ::= expr(e). { $root_tree = e; }
expr(r) ::= expr(e1) AND expr(e2). { r = new parsetree_binaryop(e1, OP_AND, e2); }
expr(r) ::= expr(e1) OR expr(e2). { r = new parsetree_binaryop(e1, OP_OR, e2); }
expr(r) ::= LPAR expr(e) RPAR. { r = e; }
The "%left AND" means that AND is left-associative (we could have chosen right too, doesn't matter for AND and OR). That "%left AND" is mentioned before "%left OR" means that AND binds tighter than OR, and the generated parser will therefore do the right thing.
When you have the syntax tree the parser gives you, generating the text representation is easy.
Edit: this seems to be a LALR parser generator which outputs a parser in JavaScript: http://sourceforge.net/projects/jscc/
Upvotes: 1