Mayank Srivastava
Mayank Srivastava

Reputation: 87

Performance issues in loading data from Databricks to Azure SQL

I am trying to load 1 million records from Delta table in Databricks to Azure SQL database using the recently released connector by Microsoft supporting Python API and Spark 3.0.

Performance does not really look awesome to me. It takes 19 minutes to load 1 million records. Below is the code which I am using. Do you think I am missing something here?

Configurations: 8 Worker nodes with 28GB memory and 8 cores. Azure SQL database is a 4 vcore Gen5 .

try:
  df.write \
    .format("com.microsoft.sqlserver.jdbc.spark") \
    .mode("overwrite") \
    .option("url", url) \
    .option("dbtable", "lending_club_acc_loans") \
    .option("user", username) \
    .option("password", password) \
    .option("tableLock", "true") \
    .option("batchsize", "200000") \
    .option("reliabilityLevel", "BEST_EFFORT") \
    .save()
except ValueError as error :
    print("Connector write failed", error)

Is there something I can do to boost the performance?

enter image description here

Upvotes: 0

Views: 1516

Answers (1)

Mayank Srivastava
Mayank Srivastava

Reputation: 87

Repartition the data frame. Earlier I had single partition on my source data frame which upon re-partition to 8 helped improve the performance.

Upvotes: 2

Related Questions