How can I use the same lexer to provide token streams with and without whitespace?

Question

I have a lexer grammar that defines a lexer that is used in two ways: to identify tokens for a syntax-aware editor, and to identify tokens for the parser. In the first case, the lexer should return comments and whitespace, but in the second case, the comments and whitespace are not wanted. Do I need two different lexer classes, each defined by its own variant of the grammar? Or can I accomplish this with a single lexer by using channels? How?

If I need two separate grammars, I assume I can factor out all the rules except for comments and whitespace, and then import those rules from that separate "common" grammar.

Mike Lischke · Accepted Answer

Usually you filter out tokens (like whitespaces) via token channels (or skip them entirely). This is part of your grammar and hence you'd need 2 grammars if you want whitespaces in one use case and not in the other. And yes, you can import a base grammar with all the common rules into specialized grammars which only hold the differences. You can even override rules (define e.g. the whitespace rule in the base grammar and redefine it in your main grammar).

But keep in mind that not filtering whitespaces will have consequences for all your other rules. In that case you would have to explicitly add whitespace handling to your parser rules everywhere. For instance:

blah: a or b;

versus

blah: a WS* or WS* b;

How can I use the same lexer to provide token streams with and without whitespace?

Answers (1)

Related Questions