Reputation: 3313
I'm back to exploring pegjs and clearly have not grasped the core concept yet. I'm trying to parse a "query language" that starts with a predicate and then a list of operands (which could include another predicate). So a simple example would be:
OR(
"string1"
"string2"
)
I would like the above to be transformed into:
{
predicate: "OR",
operands: [
{
type: "STRING",
value: "string1"
},
{
type: "STRING",
value: "string2"
}
]
}
This query:
OR(
"string1"
"string2"
AND (
"string4"
"string5"
)
"string3"
)
Would become this AST:
{
predicate: "OR",
operands: [
{
type: "STRING",
value: "string1"
},
{
type: "STRING",
value: "string2"
},
{
predicate: "AND"
operands: [
{
type: "STRING",
value: "string4"
},
{
type: "STRING",
value: "string5"
}
]
},
{
type: "STRING",
value: "string3"
}
]
}
My grammar comes close but has a couple issues. Here is the current PEGJS grammar. It can be pasted directly into the online pegjs parser (http://pegjs.majda.cz/online).
start =
or_predicate
or_predicate
= ws* "OR" ws* "(" ws* operands:or_predicate ws* ")" ws*
{ if(Array.isArray(operands)) {
return {predicate: "OR", operands: operands}
} else {
return {predicate: "OR", operands: [operands] }
}
}
/ and_predicate
and_predicate
= ws* "AND" ws* "(" operands:and_predicate ")"
{ if(Array.isArray(operands)) {
return {predicate: "AND", operands: operands}
} else {
return {predicate: "AND", operands: [operands] }
}
}
/ operands
operands
= ws* values:operand+ { return values; }
operand =
string
/ ws or_predicate:or_predicate { return or_predicate; }
string =
ws* "\"" value:valid_variable_characters "\""
{ return { type: "STRING", value: value.join("")}}
// List of valid characters for string variables
valid_variable_characters =
[a-zA-Z0-9 _]+
ws =
[ \t\n]
The above grammar handles the two examples I gave but I noticed two problems and that leads me to the following three questions.
1.The Grammar fails on this seemingly simple input (the key is that the nested OR comes immediately after the parent OR and the "string" is at the end):
OR(
OR (
"string1"
)
"string2"
)
I'm not sure what is causing this or how to fix it.
2.The grammar currently has this goofy line for the operand
rule:
operand =
string
/ ws or_predicate:or_predicate { return or_predicate; }
Note the leading whitespace (ws) on the third line before the or_predicate
. Without that whitespace I get the error 'Maximum call stack size exceeded'. I think it has to do with a left recursion but not positive about that. Ideally I'd like to be able to have no required 'ws' there so a query with no spaces like this would work:
OR("string1"OR("string2")"string3")
Right now you have to artifically add some extra whitespace likes this:
OR("string1" OR("string2") "string3")
3.Am I approaching this grammar completely incorrectly? This is only the second one I've attempted and the first was based off the pegjs arithmetic example so I realize I could be going about this completely wrong and that might be why I'm running into these issues.
Thank you for your assistance and time!
Best Regards,
Ed
Upvotes: 3
Views: 1489
Reputation: 2388
I'm also quite new to PEG but you get the hang of it after mainly looking at examples rather than reading documentation.
Try comparing your version with this one:
start
= ws* predicate:predicate ws* { return predicate; }
predicate
= "OR" ws* "(" operands:operand+ ")" { return { predicate: 'OR', operands: operands }; }
/ "AND" ws* "(" operands:operand+ ")" { return { predicate: 'AND', operands: operands }; }
operand
= ws* predicate:predicate ws* { return predicate; }
/ ws* string:string ws* { return string; }
string
= "\"" chars:valid_variable_characters+ "\"" { return { type: "STRING", value: chars.join("")}}
valid_variable_characters = [a-zA-Z0-9 _]
ws = [ \t\n]
Whitespace is optional.
OR("str1"OR("str2""str3"AND("str4""str5"))"str6")
Gives:
{
"predicate": "OR",
"operands": [
{
"type": "STRING",
"value": "str1"
},
{
"predicate": "OR",
"operands": [
{
"type": "STRING",
"value": "str2"
},
{
"type": "STRING",
"value": "str3"
},
{
"predicate": "AND",
"operands": [
{
"type": "STRING",
"value": "str4"
},
{
"type": "STRING",
"value": "str5"
}
]
}
]
},
{
"type": "STRING",
"value": "str6"
}
]
}
Upvotes: 6