Reputation: 55
I'm learning how to write and use a small lexer and parser in JavaScript. For the lexer, I chose to use the moo library, and for parsing, I decided to use Peggy. I was able to successfully match values from an array during lexing, which is crucial for me later.
I started with a simple grammar that works well, but I've realized that the rules for "action" and "object" are firmly defined in my grammar, and I want to use the tokens from my lexer in those rules. I've been struggling with this issue, and I've tried consulting ChatGPT, Google, and the documentation, but I haven't found any clear examples that could help me understand how to use my lexer tokens in the Peggy grammar. If I have overlooked something in the documentation, I would appreciate a hint.
This is my JS code (test.js
):
const peggy = require('peggy');
const moo = require('moo');
const objects = ['sword', 'key', 'lantern', 'book'];
const actions = ['drop', 'take', 'examine', 'inventory'];
const andOperator = ['and', '&'];
const orOperator = ['or', '\\|'];
let lexer = moo.compile({
object: {
match: new RegExp(objects.join('|')),
type: 'object',
},
identifier: {
match: /[a-z]+/,
type: moo.keywords({
action: actions,
}),
},
and: {
match: new RegExp(andOperator.join('|')),
},
or: {
match: new RegExp(orOperator.join('|')),
},
whitespace: {match: /\s+/, lineBreaks: true},
});
const parser = require('./parser');
const input = 'take book';
console.log(parser.parse(input, {lexer}));
The grammar (grammar.peggy
):
start
= sentence
sentence
= action _ object
_ "whitespace"
= space:[ \t\n\r]*
action = "take"+
object = "book"+
Note: I generated the grammar over the cli with peggy (peggy .\grammar.peggy
) and then renamed the output file to parser.js
. That's why in my program is the line with require('./parser')
.
If I run this:
node .\test.js
I get:
[ [ 'take' ], [ ' ' ], [ 'book' ] ]
It's kind of what I want, but as I said, I want to be able to use my lexer tokens in the grammar. I hope my problem is clear. It should be possible to use prompts like this with the grammar:
well, every combination from my arrays at this point.
I chattet with the AI's and they produced this solution for the grammar at first:
start
= sentence
sentence
= identifier _ object
_ "whitespace"
= space:[ \t\n\r]*
identifier = $identifier{ type: "action" }
object = $object{ type: "object" }
With this syntax peggy throws an error like this:
Error parsing grammar
Maximum call stack size exceeded
Then I got the snippet:
start
= sentence
sentence
= identifier _ object
_ "whitespace"
= space:[ \t\n\r]*
identifier = identifier:$[identifier]{ type: "action", value: identifier }
object = object:$[object]{ type: "object", value: object }
This got rid of the call stack
error.
But when I execute the program, I get this error:
C:\...\LexerTest\parser.js:189
var peg$f0 = function(identifier) { type: "action", value: identifier };
^
SyntaxError: Unexpected token ':'
at Object.compileFunction (node:vm:360:18)
at wrapSafe (node:internal/modules/cjs/loader:1055:15)
at Module._compile (node:internal/modules/cjs/loader:1090:27)
at Object.Module._extensions..js (node:internal/modules/cjs/loader:1180:10)
at Module.load (node:internal/modules/cjs/loader:1004:32)
at Function.Module._load (node:internal/modules/cjs/loader:839:12)
at Module.require (node:internal/modules/cjs/loader:1028:19)
at require (node:internal/modules/cjs/helpers:102:18)
at Object.<anonymous> (C:\...\LexerTest\test.js:34:16)
at Module._compile (node:internal/modules/cjs/loader:1126:14)
So I suppose I should use the $
syntax to access the lexer tokens inside the grammar, but I don't get how to use them exactly or why this call stack
error is coming up.
Upvotes: 0
Views: 235
Reputation: 10414
You can pass your set of tokens into the generated parser as options, like:
parser.parse(input, {
objects,
actions,
andOperator,
orOperator
});
Then you can see those values as properties of the options
object, and use them in predicates in your grammar:
action = act:$[a-z]+ &{ return options.actions.includes(act); }
The $
in the grammar converts the default format of [a-z]+
which is ['t', 'a', 'k', 'e']
into the string "take"
. act
gives that value a name for the predicates and actions associated with the rule. The predicate is contained within &{}
, and only allows the rule to match if the code returns a truthy value. That code receives act
as a parameter.
Upvotes: 0