A tutorial on Lexical parser

Question

I am a bit stuck with parser combinators on whitespaces. They consume keyword even if keyword is a prefix in the "keywordandtherestofthestream". Moreover, identifier = rep1("a") consumes both letters in a a as single aa. The pieces of information suggest that I need to do some lexing and it is possible to stack the parser combinator on top of the lexer.

I see that there is a special Lexical parser to serve the purpose. But why actually this parser for the tokenization? What is the point? Why is it more advantageous? What is the point for its EOL and whitespace methods? Is it anyhow related to skipWhitespace that I see in RegexParsers? Moreover, I cannot find any example of stacking the parser on top of the lexer. It seems to me that higher-level RegexParsers uses Input, which is a stream of characters. How can it be a stream of tokens?

Btw, is it possible to build position tracking (line:col) into that?

jkinkead · Accepted Answer

rep1 inside a RegexParsers will call skipWhitespace between the things it parses - meaning that you're getting a Seq with two as in it. This is as documented in RegexParsers.

If you want to get line:col data, have your result type of your parse extend Position, and wrap the parser in a call to positioned:

object Parser extends RegexParsers {
  case class MyType(value: String) extends Positional
  val myType: Parser[MyType] = positioned { "typey" ^^ { MyType.apply } }
}

A tutorial on Lexical parser

Answers (1)

Related Questions