tolyasik
tolyasik

Reputation: 355

How to extend Spark Catalyst optimizer with custom rules?

I want to use Catalyst rules to transform star-schema (https://en.wikipedia.org/wiki/Star_schema) SQL query to SQL query to denormalized star-schema where some fields from dimensions tables are represented in facts table. I tried to find some extension points to add own rules to make a transformation described above. But I didn't find any extension points. So there are the following questions:

  1. How can I add own rules to catalyst optimizer?
  2. Is there another solution to implement a functionality described above?

Upvotes: 5

Views: 2498

Answers (2)

Atais
Atais

Reputation: 11285

Following @Ambling advice you can use the sparkSession.experimental.extraStrategies to add your functionality to the SparkPlanner.

An example strategy that simply prints "Hello world" on the console

object MyStrategy extends Strategy {
  def apply(plan: LogicalPlan): Seq[SparkPlan] = {
    println("Hello world!")
    Nil
  }
}

with example run:

val spark = SparkSession.builder().master("local").getOrCreate()

spark.experimental.extraStrategies = Seq(MyStrategy)
val q = spark.catalog.listTables.filter(t => t.name == "five")
q.explain(true)
spark.stop()

You can find a working example project on friend's GitHub: https://github.com/bartekkalinka/spark-custom-rule-executor

Upvotes: 4

Ambling
Ambling

Reputation: 446

As a clue, now in Spark 2.0, you can import extraStrategies and extraOptimizations through SparkSession.experimental.

Upvotes: 2

Related Questions