Parser Combinators: Handling Whitespace In Parser without excessive Backtracking

Question

I am moving from a separate lexer and parser to a combined parser that operates on an array of characters. One problem is how to properly handle whitespace.

Problem

Take a language consisting of a sequence of characters 'a' and 'b'. Whitespace is allowed in the input but does not effect the meaning of the program.

My current approach to parse such a language is:

var token = function(p) {
    return attempt(next(
         parse.many(whitespace),
         p));
};

var a = token(char('a'));
var b = token(char('b'));

var prog = many(either(a, b));

This works, but requires unnecessary backtracking. For a program such as '___ba_alb' (Whitespace was getting stripped in the post so let _ be a space), in matching 'b', the whitespace is parsed twice, first for a and when a fails, again for b. Simply removing attempt does not work as either will never reach b if any whitespace is consumed.

Attempts

My first though was to move the token call outside of the either:

var a = char('a');
var b = char('b');

var prog = many(token(either(a, b)));

This works, but now prog cannot be reused easily. In building a parser library, this seems to require defining parsers twice: one version that actually consumes 'a' or 'b' and can be used in other parsers, and one version that correctly handles whitespace. It also clutters up parser definitions by requiring them to have explicit knowledge of how each parser operates and how it handles whitespace.

Question

The intended behavior is that an arbitrary amount of whitespace can be consumed before a token. If parsing the token fails, it backtracks to the start of the token instead of the start of the whitespace.

How can this be expressed without preprocessing the input to produce a token stream? Are there any good, real world code examples of this? The closest I found was Higher-Order Functions for Parsing but this does not seem to address my specific concern.

Parser Combinators: Handling Whitespace In Parser without excessive Backtracking

Problem

Attempts

Question

Answers (1)

Related Questions