jcc333
jcc333

Reputation: 769

Representing ad-hoc operator precedence in Scala Parser

So I'm trying to write a parser specifically for the arithmetic fragment of a programming language I'm playing with, using scala RegexParsers. As it stands, my top-level expression parser is of the form:

parser: Parser[Exp] = binAppExp | otherKindsOfParserLike | lval | int

It accepts lvals (things like "a.b, a.b[c.d], a[b], {record=expression, like=this}" just fine. Now, I'd like to enable expressions like "1 + b / c = d", but potentially with (source language, not Scala) compile-time user-defined operators.

My initial thought was, if I encode the operations recursively and numerically by precedence, then I could add higher precedences ad-hoc, and each level of precedence could parse consuming lower-precedence sub-terms on the right-side of the operation expression. So, I'm trying to build a toy of that idea with just some fairly common operators. So I'd expect "1 * 2+1" to parse into something like Call(*, Seq(1, Call(+ Seq(2,1)))), where case class Call(functionName: String, args: Seq[Exp]) extends Exp.

Instead though, it parses as IntExp(1).

Is there a reason that this can't work (is it left-recursive in a way I'm missing? If so, I'm sure there's something else wrong, or it'd never terminate, right?), or is it just plain wrong for some other reason?

  def binAppExp: Parser[Exp] = {
    //assume a registry of operations
    val ops = Map(
      (7, Set("*", "/")),
      (6, Set("-", "+")),
      (4, Set("=", "!=", ">", "<", ">=", "<=")),
      (3, Set("&")),
      (2, Set("|"))
    )

    //relevant ops for a level of precedence
    def opsWithPrecedence(n: Int): Set[String] = ops.getOrElse(n, Set.empty)

    //parse an op with some level of precedence
    def opWithPrecedence(n: Int): Parser[String] = ".+".r ^? (
      { case s if opsWithPrecedence(n).contains(s) => s },
      { case s => s"SYMBOL NOT FOUND: $s" }
      )

    //assuming the parse happens, encode it as an AST representation
    def folder(h: Exp, t: LangParser.~[String, Exp]): CallExp =
      CallExp(t._1, Seq(h, t._2))

    val maxPrecedence: Int = ops.maxBy(_._1)._1

    def term: (Int => Parser[Exp]) = {
      case 0 => lval | int | notApp | "(" ~> term(maxPrecedence) <~ ")"
      case n =>
        val lowerTerm = term(n - 1)
        lowerTerm ~ rep(opWithPrecedence(n) ~ lowerTerm) ^^ {
          case h ~ ts => ts.foldLeft(h)(folder)
        }
    }

    term(maxPrecedence)
  }

Upvotes: 0

Views: 311

Answers (1)

jcc333
jcc333

Reputation: 769

Okay, so there was nothing inherently impossible with what I was trying to do, it was just wrong in the details.

The core idea is just: maintain a mapping from level of precedence to operators/parsers, and recursively look for parses based on that table. If you allow parenthetical expressions, just nest a call to your most precedent possible parser within the call to the parenthetical terms' parser.

Just in case anyone else ever wants to do something like this, here's code for a set of arithmetic/logical operators, heavily commented to relate it to the above:

 def opExp: Parser[Exp] = {
sealed trait Assoc

val ops = Map(
  (1, Set("*", "/")),
  (2, Set("-", "+")),
  (3, Set("=", "!=", ">", "<", ">=", "<=")),
  (4, Set("&")),
  (5, Set("|"))
)

def opsWithPrecedence(n: Int): Set[String] = ops.getOrElse(n, Set.empty)

/* before, this was trying to match the remainder of the expression,
   so something like `3 - 2` would parse the Int(3),
   and try to pass "- 2" as an operator to the op parser.
   RegexParsers has an implicit def "literal : String => SubclassOfParser[String]",
   that I'm using explicitly here.
*/

def opWithPrecedence(n: Int): Parser[String] = {
  val ops = opsWithPrecedence(n)
  if (ops.size > 1) {
    ops.map(literal).fold (literal(ops.head)) {
      case (l1, l2) => l1 | l2
    }
  } else if (ops.size == 1) {
    literal(ops.head)
  } else {
    failure(s"No Ops for Precedence $n")
  }
}

def folder(h: Exp, t: TigerParser.~[String, Exp]): CallExp = CallExp(t._1, Seq(h, t._2))

val maxPrecedence: Int = ops.maxBy(_._1)._1

def term: (Int => Parser[Exp]) = {
  case 0 => lval | int | "(" ~> { term(maxPrecedence) } <~ ")"
  case n if n > 0 =>
    val lowerTerm = term(n - 1)
    lowerTerm ~ rep(opWithPrecedence(n) ~ lowerTerm) ^^ {
      case h ~ ts if ts.nonEmpty => ts.foldLeft(h)(folder)
      case h ~ _ => h
    }
}

term(maxPrecedence)

}

Upvotes: 1

Related Questions