Suppenterrine
Suppenterrine

Reputation: 55

Problem: Integration of Lexer Tokens in PEG Grammar in JavaScript

Motivation

I'm learning how to write and use a small lexer and parser in JavaScript. For the lexer, I chose to use the moo library, and for parsing, I decided to use Peggy. I was able to successfully match values from an array during lexing, which is crucial for me later.

Problem

I started with a simple grammar that works well, but I've realized that the rules for "action" and "object" are firmly defined in my grammar, and I want to use the tokens from my lexer in those rules. I've been struggling with this issue, and I've tried consulting ChatGPT, Google, and the documentation, but I haven't found any clear examples that could help me understand how to use my lexer tokens in the Peggy grammar. If I have overlooked something in the documentation, I would appreciate a hint.

This is my JS code (test.js):

const peggy = require('peggy');
const moo = require('moo');

const objects = ['sword', 'key', 'lantern', 'book'];
const actions = ['drop', 'take', 'examine', 'inventory'];
const andOperator = ['and', '&'];
const orOperator = ['or', '\\|'];

let lexer = moo.compile({
  
  object: {
    match: new RegExp(objects.join('|')),
    type: 'object',
  },
  
  identifier: {
    match: /[a-z]+/,
    type: moo.keywords({
      action: actions,
    }),
  },

  and: {
    match: new RegExp(andOperator.join('|')),
  },

  or: {
    match: new RegExp(orOperator.join('|')),
  },
  
  whitespace: {match: /\s+/, lineBreaks: true},
});

const parser = require('./parser');

const input = 'take book';
console.log(parser.parse(input, {lexer}));

The grammar (grammar.peggy):

start
  = sentence

sentence
  = action _ object

_ "whitespace"
  = space:[ \t\n\r]*

action = "take"+
object = "book"+

Note: I generated the grammar over the cli with peggy (peggy .\grammar.peggy) and then renamed the output file to parser.js. That's why in my program is the line with require('./parser').

If I run this:

node .\test.js

I get:

[ [ 'take' ], [ ' ' ], [ 'book' ] ]

It's kind of what I want, but as I said, I want to be able to use my lexer tokens in the grammar. I hope my problem is clear. It should be possible to use prompts like this with the grammar:

well, every combination from my arrays at this point.

Edit 13.03.2023

I chattet with the AI's and they produced this solution for the grammar at first:

start
  = sentence

sentence
  = identifier _ object

_ "whitespace"
  = space:[ \t\n\r]*

identifier = $identifier{ type: "action" }
object = $object{ type: "object" }

With this syntax peggy throws an error like this:

Error parsing grammar
Maximum call stack size exceeded

Then I got the snippet:

start
  = sentence

sentence
  = identifier _ object

_ "whitespace"
  = space:[ \t\n\r]*

identifier = identifier:$[identifier]{ type: "action", value: identifier }
object = object:$[object]{ type: "object", value: object }

This got rid of the call stack error. But when I execute the program, I get this error:

C:\...\LexerTest\parser.js:189
  var peg$f0 = function(identifier) { type: "action", value: identifier };
                                                           ^

SyntaxError: Unexpected token ':'
    at Object.compileFunction (node:vm:360:18)
    at wrapSafe (node:internal/modules/cjs/loader:1055:15)
    at Module._compile (node:internal/modules/cjs/loader:1090:27)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1180:10)
    at Module.load (node:internal/modules/cjs/loader:1004:32)
    at Function.Module._load (node:internal/modules/cjs/loader:839:12)
    at Module.require (node:internal/modules/cjs/loader:1028:19)
    at require (node:internal/modules/cjs/helpers:102:18)
    at Object.<anonymous> (C:\...\LexerTest\test.js:34:16)
    at Module._compile (node:internal/modules/cjs/loader:1126:14)

So I suppose I should use the $ syntax to access the lexer tokens inside the grammar, but I don't get how to use them exactly or why this call stack error is coming up.

Upvotes: 0

Views: 235

Answers (1)

Joe Hildebrand
Joe Hildebrand

Reputation: 10414

You can pass your set of tokens into the generated parser as options, like:

parser.parse(input, {
  objects,
  actions,
  andOperator,
  orOperator
});

Then you can see those values as properties of the options object, and use them in predicates in your grammar:

action = act:$[a-z]+ &{ return options.actions.includes(act); }

The $ in the grammar converts the default format of [a-z]+ which is ['t', 'a', 'k', 'e'] into the string "take". act gives that value a name for the predicates and actions associated with the rule. The predicate is contained within &{}, and only allows the rule to match if the code returns a truthy value. That code receives act as a parameter.

Upvotes: 0

Related Questions