Valentin Tihomirov
Valentin Tihomirov

Reputation: 1

A tutorial on Lexical parser

I am a bit stuck with parser combinators on whitespaces. They consume keyword even if keyword is a prefix in the "keywordandtherestofthestream". Moreover, identifier = rep1("a") consumes both letters in a a as single aa. The pieces of information suggest that I need to do some lexing and it is possible to stack the parser combinator on top of the lexer.

I see that there is a special Lexical parser to serve the purpose. But why actually this parser for the tokenization? What is the point? Why is it more advantageous? What is the point for its EOL and whitespace methods? Is it anyhow related to skipWhitespace that I see in RegexParsers? Moreover, I cannot find any example of stacking the parser on top of the lexer. It seems to me that higher-level RegexParsers uses Input, which is a stream of characters. How can it be a stream of tokens?

Btw, is it possible to build position tracking (line:col) into that?

Upvotes: 0

Views: 90

Answers (1)

jkinkead
jkinkead

Reputation: 4421

rep1 inside a RegexParsers will call skipWhitespace between the things it parses - meaning that you're getting a Seq with two as in it. This is as documented in RegexParsers.

If you want to get line:col data, have your result type of your parse extend Position, and wrap the parser in a call to positioned:

object Parser extends RegexParsers {
  case class MyType(value: String) extends Positional
  val myType: Parser[MyType] = positioned { "typey" ^^ { MyType.apply } }
}

Upvotes: 1

Related Questions