Gabor Szarnyas
Gabor Szarnyas

Reputation: 5047

Convert from a logical plan to another logical plan in Spark Catalyst

I use Spark Catalyst for representing the query plans for an openCypher query engine, ingraph. During the query planning process, I would like to convert from a certain logical plan (Plan1) to another logical plan (Plan2). (I try to keep the question simple, so I omitted some details here. The project is fully open-source, so if required, I am happy to provide more information on why this is necessary.)

The best approach I could find is to use transformDown recursively. Here is a small example that converts from Plan1Nodes to Plan2Nodes by replacing each OpA1 instance with OpA2 and each OpB1 instance with OpB2.

import org.apache.spark.sql.catalyst.expressions.Attribute
import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, UnaryNode}

trait Plan1Node extends LogicalPlan

case class OpA1() extends LeafNode with Plan1Node {
  override def output: Seq[Attribute] = Seq()
}
case class OpB1(child: Plan1Node) extends UnaryNode with Plan1Node {
  override def output: Seq[Attribute] = Seq()
}

trait Plan2Node extends LogicalPlan

case class OpA2() extends LeafNode with Plan2Node {
  override def output: Seq[Attribute] = Seq()
}
case class OpB2(child: Plan2Node) extends UnaryNode with Plan2Node {
  override def output: Seq[Attribute] = Seq()
}

object Plan1ToPlan2 {
  def transform(plan: Plan1Node): Plan2Node = {
    plan.transformDown {
      case OpA1() => OpA2()
      case OpB1(child) => OpB2(transform(child))
    }
  }.asInstanceOf[Plan2Node]
}

This approach does the job. This code:

val p1 = OpB1(OpA1())
val p2 = Plan1ToPlan2.transform(p1)

Results in:

p1: OpB1 = OpB1
+- OpA1

p2: Plan2Node = OpB2
+- OpA2

However, using asInstanceOf[Plan2Node] is definitely a bad smell in the code. I considered using a Strategy for defining conversion rules, but that class is for converting from physical to logical plans.

Is there a more elegant way to define a transformation between logical plans? Or is using multiple logical plans considered an antipattern?

Upvotes: 5

Views: 1177

Answers (1)

Gabor Szarnyas
Gabor Szarnyas

Reputation: 5047

(Answering my own question.)

Changing transformDown to a simple pattern matching (match) and recursive calls solved the type issue:

object Plan1ToPlan2 {
  def transform(plan: Plan1Node): Plan2Node = {
    plan match {
      case OpA1() => OpA2()
      case OpB1(child) => OpB2(transform(child))
    }
  }
}

This seems a type-safe and (based on my limited understanding of Catalyst) idiomatic solution.

Upvotes: 0

Related Questions