Roberto Bonvallet
Roberto Bonvallet

Reputation: 33329

Make parser include surrounding whitespace in string literals

I wrote a Scala parser for an in-house expression language that has double quote-delimited string literals:

object MyParser extends JavaTokenParsers {
  lazy val strLiteral = "\"" ~> """[^"]*""".r <~ "\"" ^^ {
    case x ⇒ StringLiteral(x)
  }
  // ...
}

(The actual code is a bit different since I support "" as an escape sequence for a literal double quote. While this is not relevant for the discussion, it's the reason why I cannot just use JavaTokenParsers's stringLiteral).

I noticed that the parser fails to include whitespace at the beginning and at the end of a string:

"a"   parsed as StringLiteral("a")
" a"  parsed as StringLiteral("a")
"a "  parsed as StringLiteral("a")
" a " parsed as StringLiteral("a")

I tried matching whitespace in the regex:

"\"" ~> """\s*[^"]*\s*""".r <~ "\""

and also using the explicit whiteSpace parser:

"\"" ~> whiteSpace.? ~ """[^"]*""".r ~ whiteSpace.? <~ "\""

but in both cases the ~> operator has already consumed and ignored the spaces before there's a chance to read and handle them.

I know that I can set skipWhitespace = false, but I prefer not to, since in general I want to allow arbitrary whitespace around tokens in this language.

What's a simple and clean strategy to include surrounding whitespace in string literals?

Upvotes: 2

Views: 152

Answers (1)

Aivean
Aivean

Reputation: 10882

One option you have is to use single regexp for your string literal:

val stringLiteral:Parser[String] = """"([^"]*("")?)*"""".r

and then strip matched quotes afterwards.

Upvotes: 1

Related Questions