Reputation: 10287
I'm beginning PEG's with PEG.js
.
There's something I can't get my head around...I'm sure it's simple but it's giving me a headache trying to understand the concept...
Consider this two-rule grammar:
name
= name:.* {return name.join("")}
put
= "put " a:name " into " b:name "." {put(a,b)}
I want to be able to pass this parser "put foo into bar." and cause put("foo","bar")
to evaluate.
But PEG.js gives me the error Expected " into " or any character but end of input found.
I think I could fix this problem if the regex for the name
rule where more specific than .*
but why does it have to be? Can't the parser be smart enough to look ahead and see that " into " is coming up, as well as the "." at the end?
How can I achieve what I'm looking for? Is this perhaps the difference between a "bottom-up" and "top-down" grammar?
Edit:
The regex /put (.*) into (.*)/g
works like I want -- if I pass it "put foo into bar"
, it gives me $1="foo"
and $2="bar"
. I'm just asking if I can get this same functionality (taking the whole string into account before deciding where the token boundaries are) using PEGjs or Jison.
Upvotes: 0
Views: 511
Reputation: 315
Sorry for my bad english.
The first rule .*
try to read every character, so end of file is reached.
You can make name rule more specific (any char but not SPACE or DOT):
name = [^ .]+
This one works... But not for name with spaces inside.
put = "put " a:name " into " b:name "." {put(a,b);}
name = c:[^ .]+ {return c.join("");}
For Node.js
var PEG = require('pegjs');
var text = 'put foo into bar.';
var parser = PEG.buildParser('{function put(a,b){ console.log(a,b); }}put = "put " a:name " into " b:name "." {put(a,b);}\nname = c:[^ .]+ {return c.join("");}');
parser.parse(text);
I have no good solution for spaces in name but try something like this:
nameBeforeInto = (!" into ".)+
nameBeforeDot = [^.]+
put = "put " nameBeforeInto " into " nameBeforeDot "."
nameBeforeInto return a multi-dimension array
Fabrice
Upvotes: 0
Reputation: 433
I'm fairly sure that "themirror" is correct - the first rule will eat all the input. Try it without that rule.
Also you should have another rule to allow arbitrary spaces. I found this online...
_
= [ \r\n\t]*
The underscore will match any number of whitespace characters. Then you can rewrite your rule as...
put
= "put" _ a:name _ "into" _ b:name _ "." {put(a,b)}
Upvotes: 1