Reputation: 10287

Parsing Expression Grammars: detect next token?

I'm beginning PEG's with PEG.js.

There's something I can't get my head around...I'm sure it's simple but it's giving me a headache trying to understand the concept...

Consider this two-rule grammar:

name
  = name:.* {return name.join("")}

put
  = "put " a:name " into " b:name "." {put(a,b)}

I want to be able to pass this parser "put foo into bar." and cause put("foo","bar") to evaluate.

But PEG.js gives me the error Expected " into " or any character but end of input found.

I think I could fix this problem if the regex for the name rule where more specific than .* but why does it have to be? Can't the parser be smart enough to look ahead and see that " into " is coming up, as well as the "." at the end?

How can I achieve what I'm looking for? Is this perhaps the difference between a "bottom-up" and "top-down" grammar?

Edit: The regex /put (.*) into (.*)/g works like I want -- if I pass it "put foo into bar", it gives me $1="foo" and $2="bar". I'm just asking if I can get this same functionality (taking the whole string into account before deciding where the token boundaries are) using PEGjs or Jison.

Upvotes: 0

Answers (2)

Fabrice Theytaz

Reputation: 315

Sorry for my bad english.

The first rule .* try to read every character, so end of file is reached.

You can make name rule more specific (any char but not SPACE or DOT):

name = [^ .]+

This one works... But not for name with spaces inside.

put = "put " a:name " into " b:name "." {put(a,b);}

name = c:[^ .]+ {return c.join("");}

For Node.js

var PEG = require('pegjs');
var text = 'put foo into bar.';
var parser = PEG.buildParser('{function put(a,b){ console.log(a,b); }}put = "put " a:name " into " b:name "." {put(a,b);}\nname = c:[^ .]+ {return c.join("");}');
parser.parse(text);

I have no good solution for spaces in name but try something like this:

nameBeforeInto = (!" into ".)+

nameBeforeDot = [^.]+

put = "put " nameBeforeInto " into " nameBeforeDot "."

nameBeforeInto return a multi-dimension array

Fabrice

Upvotes: 0

Dale Stanbrough

Reputation: 433

I'm fairly sure that "themirror" is correct - the first rule will eat all the input. Try it without that rule.

Also you should have another rule to allow arbitrary spaces. I found this online...

_
  = [ \r\n\t]*

The underscore will match any number of whitespace characters. Then you can rewrite your rule as...

put
   = "put" _ a:name _ "into" _  b:name _  "." {put(a,b)}

Upvotes: 1

Parsing Expression Grammars: detect next token?

Answers (2)

Related Questions