How to improve performance of spark.write for jdbc?

Question

I'm struggling with one thing. I have 700mb csv which conains over 6mln rows. After filtering it contains ~3mln.

I need to write it straight to azure sql via jdbc. It's super slow and takes 20min to input 3mln rows.

My cluster has 14gb ram and 4 cores. Here is my code.

(clearedDF.repartition(4)
  .write
  .format("jdbc")
  .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
  .option("batchsize", 10000)
  .option("url", jdbcUrl)
  .option("dbtable", "dbo.weather")
  .option("user", properties["user"])
  .option("password", properties["password"])
  .mode("append")
  .save()
)

Is there any way to speed this process up?

How to improve performance of spark.write for jdbc?

Answers (1)

Related Questions