Reputation: 1311
I'm working with the native parser combinator library in Scala and I'd like to parse some parts of my input, but not others. Specifically, I'd like to discard all of the arbitrary text between inputs that I care about. For example, with this input:
begin
Text I care about
Text I care about
DONT CARE
Text I don't care about
begin
More text I care about
...
Right now I have:
object MyParser extends RegexParsers {
val beginToken: Parser[String] = "begin"
val dontCareToken: Parser[String] = "DONT CARE"
val text: Parser[String] = not(dontCareToken) ~> """([^\n]+)""".r
val document: Parser[String] = begin ~> text.+ <~ dontCareToken ^^ { _.mkString("\n") }
val documents: Parser[Iterable[String]] = document.+
but I'm not sure how to ignore the text that comes after DONT CARE
and until the next begin
. Specifically, I don't want to make any assumptions about the form of that text, I just want to start parsing again at the next begin
statement.
Upvotes: 3
Views: 1016
Reputation: 1725
You almost had it. Parse for what you don't care and then do nothing with it.
I added dontCareText and skipDontCare and then in your document parser indicated that skipDontCare was optional.
import scala.util.parsing.combinator.RegexParsers
object MyParser extends RegexParsers {
val beginToken: Parser[String] = "begin"
val dontCareToken: Parser[String] = "DONT CARE"
val text: Parser[String] = not(dontCareToken) ~> """([^\n]+)""".r
val dontCareText: Parser[String] = not(beginToken) ~> """([^\n]+)""".r
val skipDontCare = dontCareToken ~ dontCareText ^^ { case c => "" }
val document: Parser[String] =
beginToken ~> text.+ <~ opt(skipDontCare) ^^ {
_.mkString("\n")
}
val documents: Parser[Iterable[String]] = document.+
}
val s = """begin
Text I care about
Text I care about
DONT CARE
Text I don't care about
begin
More text I care about
"""
MyParser.parseAll(MyParser.documents,s)
Upvotes: 6