int_index
int_index

Reputation: 618

Omit the remaining input in Happy (parser generator for Haskell)

According to the Pascal grammar, a program ends with a dot. And if there's anything after it, Free Pascal (FPC/Lazarus) omits the remaining characters.

I want the similar behavior. I use a custom monadic tokenizer and it is lazy, so I simply want Happy not to call the continuation when the main rule succeded.

Essentially I would like somithng like this:

Program : Header Decls Body '.' SKIP_THE_REMAINING_INPUT { ... }

It is important that no tokenization should happen at all after this last dot is parsed, because it could cause errors.

Upvotes: 3

Views: 183

Answers (1)

int_index
int_index

Reputation: 618

So I found the solution.

There's a feature called partial parsing in Happy, described in the documentation, though I discovered it reading git log of the source repository. It allows the parser to discard the remaining input. It is declared using a directive different than %name:

%name    parser {- normal  parser -}
%partial parser {- partial parser -}

But the way it works doesn't fit my second requirement: it should not force the lazy tokenizer to consume input any further. Instead it needs exactly one more token to verify that there's nothing more to parse.

Assume that ! is not a valid symbol and the tokenizer fails to consume it, and consider the following inputs:

  1. begin end. valid_token!!!
  2. begin end.!

Parsing (1) succeeds, because Happy checks the valid_token and stops there, but parsing (2) fails, since one more token is needed (and the tokenizer is unable to give it).

Apparently there's no way to change this behavior, so my workaround is to represent a lexical error by a special token that appears nowhere in the grammar. Thus when tokenizer encounters ! (or any other invalid character) it yields a special error token. It also should help to recover from lexical errors.

Upvotes: 1

Related Questions