Reputation: 1
I am a bit stuck with parser combinators on whitespaces. They consume keyword even if keyword is a prefix in the "keywordandtherestofthestream". Moreover, identifier = rep1("a")
consumes both letters in a a
as single aa
. The pieces of information suggest that I need to do some lexing and it is possible to stack the parser combinator on top of the lexer.
I see that there is a special Lexical parser to serve the purpose. But why actually this parser for the tokenization? What is the point? Why is it more advantageous? What is the point for its EOL
and whitespace
methods? Is it anyhow related to skipWhitespace
that I see in RegexParsers
? Moreover, I cannot find any example of stacking the parser on top of the lexer. It seems to me that higher-level RegexParsers uses Input
, which is a stream of characters. How can it be a stream of tokens?
Btw, is it possible to build position tracking (line:col
) into that?
Upvotes: 0
Views: 90
Reputation: 4421
rep1
inside a RegexParsers
will call skipWhitespace
between the things it parses - meaning that you're getting a Seq
with two a
s in it. This is as documented in RegexParsers.
If you want to get line:col
data, have your result type of your parse extend Position, and wrap the parser in a call to positioned:
object Parser extends RegexParsers {
case class MyType(value: String) extends Positional
val myType: Parser[MyType] = positioned { "typey" ^^ { MyType.apply } }
}
Upvotes: 1