Reputation: 769
So I'm trying to write a parser specifically for the arithmetic fragment of a programming language I'm playing with, using scala RegexParsers. As it stands, my top-level expression parser is of the form:
parser: Parser[Exp] = binAppExp | otherKindsOfParserLike | lval | int
It accepts lvals (things like "a.b, a.b[c.d], a[b], {record=expression, like=this}"
just fine. Now, I'd like to enable expressions like "1 + b / c = d"
, but potentially with (source language, not Scala) compile-time user-defined operators.
My initial thought was, if I encode the operations recursively and numerically by precedence, then I could add higher precedences ad-hoc, and each level of precedence could parse consuming lower-precedence sub-terms on the right-side of the operation expression. So, I'm trying to build a toy of that idea with just some fairly common operators.
So I'd expect "1 * 2+1"
to parse into something like Call(*, Seq(1, Call(+ Seq(2,1))))
, where case class Call(functionName: String, args: Seq[Exp]) extends Exp
.
Instead though, it parses as IntExp(1)
.
Is there a reason that this can't work (is it left-recursive in a way I'm missing? If so, I'm sure there's something else wrong, or it'd never terminate, right?), or is it just plain wrong for some other reason?
def binAppExp: Parser[Exp] = {
//assume a registry of operations
val ops = Map(
(7, Set("*", "/")),
(6, Set("-", "+")),
(4, Set("=", "!=", ">", "<", ">=", "<=")),
(3, Set("&")),
(2, Set("|"))
)
//relevant ops for a level of precedence
def opsWithPrecedence(n: Int): Set[String] = ops.getOrElse(n, Set.empty)
//parse an op with some level of precedence
def opWithPrecedence(n: Int): Parser[String] = ".+".r ^? (
{ case s if opsWithPrecedence(n).contains(s) => s },
{ case s => s"SYMBOL NOT FOUND: $s" }
)
//assuming the parse happens, encode it as an AST representation
def folder(h: Exp, t: LangParser.~[String, Exp]): CallExp =
CallExp(t._1, Seq(h, t._2))
val maxPrecedence: Int = ops.maxBy(_._1)._1
def term: (Int => Parser[Exp]) = {
case 0 => lval | int | notApp | "(" ~> term(maxPrecedence) <~ ")"
case n =>
val lowerTerm = term(n - 1)
lowerTerm ~ rep(opWithPrecedence(n) ~ lowerTerm) ^^ {
case h ~ ts => ts.foldLeft(h)(folder)
}
}
term(maxPrecedence)
}
Upvotes: 0
Views: 311
Reputation: 769
Okay, so there was nothing inherently impossible with what I was trying to do, it was just wrong in the details.
The core idea is just: maintain a mapping from level of precedence to operators/parsers, and recursively look for parses based on that table. If you allow parenthetical expressions, just nest a call to your most precedent possible parser within the call to the parenthetical terms' parser.
Just in case anyone else ever wants to do something like this, here's code for a set of arithmetic/logical operators, heavily commented to relate it to the above:
def opExp: Parser[Exp] = {
sealed trait Assoc
val ops = Map(
(1, Set("*", "/")),
(2, Set("-", "+")),
(3, Set("=", "!=", ">", "<", ">=", "<=")),
(4, Set("&")),
(5, Set("|"))
)
def opsWithPrecedence(n: Int): Set[String] = ops.getOrElse(n, Set.empty)
/* before, this was trying to match the remainder of the expression,
so something like `3 - 2` would parse the Int(3),
and try to pass "- 2" as an operator to the op parser.
RegexParsers has an implicit def "literal : String => SubclassOfParser[String]",
that I'm using explicitly here.
*/
def opWithPrecedence(n: Int): Parser[String] = {
val ops = opsWithPrecedence(n)
if (ops.size > 1) {
ops.map(literal).fold (literal(ops.head)) {
case (l1, l2) => l1 | l2
}
} else if (ops.size == 1) {
literal(ops.head)
} else {
failure(s"No Ops for Precedence $n")
}
}
def folder(h: Exp, t: TigerParser.~[String, Exp]): CallExp = CallExp(t._1, Seq(h, t._2))
val maxPrecedence: Int = ops.maxBy(_._1)._1
def term: (Int => Parser[Exp]) = {
case 0 => lval | int | "(" ~> { term(maxPrecedence) } <~ ")"
case n if n > 0 =>
val lowerTerm = term(n - 1)
lowerTerm ~ rep(opWithPrecedence(n) ~ lowerTerm) ^^ {
case h ~ ts if ts.nonEmpty => ts.foldLeft(h)(folder)
case h ~ _ => h
}
}
term(maxPrecedence)
}
Upvotes: 1