John Sullivan
John Sullivan

Reputation: 1311

Scala Parser Combinators - consume until match

I'm working with the native parser combinator library in Scala and I'd like to parse some parts of my input, but not others. Specifically, I'd like to discard all of the arbitrary text between inputs that I care about. For example, with this input:

begin

Text I care about
Text I care about

DONT CARE

Text I don't care about

begin

More text I care about
...

Right now I have:

object MyParser extends RegexParsers {
    val beginToken: Parser[String] = "begin"
    val dontCareToken: Parser[String] = "DONT CARE"
    val text: Parser[String] = not(dontCareToken) ~> """([^\n]+)""".r

    val document: Parser[String] = begin ~> text.+ <~ dontCareToken ^^ { _.mkString("\n") }
    val documents: Parser[Iterable[String]] = document.+

but I'm not sure how to ignore the text that comes after DONT CARE and until the next begin. Specifically, I don't want to make any assumptions about the form of that text, I just want to start parsing again at the next begin statement.

Upvotes: 3

Views: 1016

Answers (1)

Keith Pinson
Keith Pinson

Reputation: 1725

You almost had it. Parse for what you don't care and then do nothing with it.

I added dontCareText and skipDontCare and then in your document parser indicated that skipDontCare was optional.

import scala.util.parsing.combinator.RegexParsers   

object MyParser extends RegexParsers {
    val beginToken: Parser[String] = "begin"
    val dontCareToken: Parser[String] = "DONT CARE"
    val text: Parser[String] = not(dontCareToken) ~> """([^\n]+)""".r
    val dontCareText: Parser[String] = not(beginToken) ~> """([^\n]+)""".r
    val skipDontCare = dontCareToken ~ dontCareText ^^ { case c => "" }

    val document: Parser[String] = 
      beginToken ~> text.+ <~ opt(skipDontCare) ^^ { 
        _.mkString("\n") 
      }
    val documents: Parser[Iterable[String]] = document.+
}


val s = """begin

Text I care about
Text I care about

DONT CARE

Text I don't care about

begin

More text I care about
"""

MyParser.parseAll(MyParser.documents,s)

Upvotes: 6

Related Questions