justcode
justcode

Reputation: 128

Difference in writing Spark applications

Is there a difference in performance if you write Spark applications via method chains vs SparkSQL? I know writing codes using methods is more flexible but I'm not sure about the performance between the two.

Example:

spark.select().filter().etc....

versus

spark.sql("<insert query here>")

Upvotes: 0

Views: 42

Answers (1)

Constantine
Constantine

Reputation: 1416

There is no difference in performance between

df.select($"some_col").filter($"filter_col" === "somevalue")

and

spark.sql("select some_col from some_table where filter_col = 'somevalue'")

The spark plan that gets generated for both the cases is the same. Out of these, which to choose is completely subjective.

You can check the spark plan by running:

df.queryExecution.sparkPlan

Further reads on Spark plan :

https://dzone.com/articles/understanding-optimized-logical-plan-in-spark https://databricks.com/blog/2015/04/13/deep-dive-into-spark-sqls-catalyst-optimizer.html

Upvotes: 1

Related Questions