Kenneth McDonald
Kenneth McDonald

Reputation: 131

Having some simple problems with Scala combinator parsers

First, the code:

package com.digitaldoodles.markup

import scala.util.parsing.combinator.{Parsers, RegexParsers}
import com.digitaldoodles.rex._


class MarkupParser extends RegexParsers {
    val stopTokens = (Lit("{{") | "}}" | ";;" | ",,").lookahead
    val name: Parser[String] = """[@#!$]?[a-zA-Z][a-zA-Z0-9]*""".r
    val content: Parser[String] = (patterns.CharAny ** 0 & stopTokens).regex
    val function: Parser[Any] = name ~ repsep(content, "::") <~ ";;"
    val block1: Parser[Any] = "{{" ~> function
    val block2: Parser[Any] = "{{" ~> function <~ "}}"
    val lst: Parser[Any] = repsep("[a-z]", ",") 
}

object ParseExpr extends MarkupParser {
    def main(args: Array[String]) {
        println("Content regex is ", (patterns.CharAny ** 0 & stopTokens).regex)
        println(parseAll(block1, "{{@name 3:4:foo;;"))
        println(parseAll(block2, "{{@name 3:4:foo;; stuff}}"))
        println(parseAll(lst, "a,b,c")) 
    }
}

then, the run results:

[info] == run ==
[info] Running com.digitaldoodles.markup.ParseExpr 
(Content regex is ,(?:[\s\S]{0,})(?=(?:(?:\{\{|\}\})|;;)|\,\,))
[1.18] parsed: (@name~List(3:4:foo))
[1.24] failure: `;;' expected but `}' found

{{@name 3:4:foo;; stuff}}
                       ^

[1.1] failure: string matching regex `\z' expected but `a' found

a,b,c
^

I use a custom library to assemble some of my regexes, so I've printed out the "content" regex; its supposed to be basically any text up to but not including certain token patterns, enforced using a positive lookahead assertion.

Finally, the problems:

1) The first run on "block1" succeeds, but shouldn't, because the separator in the "repsep" function is "::", yet ":" are parsed as separators.

2) The run on "block2" fails, presumably because the lookahead clause isn't working--but I can't figure out why this should be. The lookahead clause was already exercised in the "repsep" on the run on "block1" and seemed to work there, so why should it fail on block 2?

3) The simple repsep exercise on "lst" fails because internally, the parser engine seems to be looking for a boundary--is this something I need to work around somehow?

Thanks, Ken

Upvotes: 2

Views: 739

Answers (1)

Daniel C. Sobral
Daniel C. Sobral

Reputation: 297295

1) No, "::" are not parsed as separators. If it did, the output would be (@name~List(3, 4, foo)).

2) It happens because "}}" is also a delimiter, so it takes the longest match it can -- the one that includes ";;" as well. If you make the preceding expression non-eager, it will then fail at "s" on "stuff", which I presume is what you expected.

3) You passed a literal, not a regex. Modify "[a-z]" to "[a-z]".r and it will work.

Upvotes: 2

Related Questions