Reputation: 41

How to improve query performance in spark?

I have a query which joins 4 tables and i used query pushdown to read it into a dataframe.

val df = spark.read.format("jdbc").
 option("url", "jdbc:mysql://ip/dbname").
 option("driver", "com.mysql.jdbc.Driver").
 option("user", "username").
 option("password", "password")
 .option("dbtable",s"($query) as temptable")
 .load()

The number of records in individual tables are 430, 350, 64, 2354 respectively and it takes 12.784 sec to load and 2.119 sec for creating SparkSession

then I count the resultdata as,

 val count=df.count()
 println(s"count $count")

then the total execution time 25.806 sec and the result contains only 430 records.

When I try the same in sql workbench it only takes few sec to execute completely. Also I tried cache after load() but it take the same time. So how can I execute it much faster than what I did.

Upvotes: 1