Issue loading big data using Apache Spark Connector for SQL Server to Azure SQL

Question

I'm trying to load a pyspark dataframe into Azure SQL DB using Apache Spark Connector for SQL Server and Azure SQL in Azure DataBricks Env

[Environment] - Azure DataBricks

DBR: 9.1 LTS
Driver and Worker nodes: DS3_V2
No. of workers: 2 to 8 [AutoScaling]

[Dataset] - NYC Yellow Taxi Dataset

It works fine for data size around 30M, but for the data sizes around 90M I get the below issue:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 20.0 failed 4 times, most recent failure: Lost task 5.3 in stage 20.0 (TID 381) (10.139.64.7 executor 5): com.microsoft.sqlserver.jdbc.SQLServerException: Database '[database]' on server '[servername]' is not currently available. Please retry the connection later. If the problem persists, contact customer support, and provide them the session tracing ID of [some id]

The code that I use:

try:
  df.write \
    .format("com.microsoft.sqlserver.jdbc.spark") \
    .mode("overwrite") \
    .option("truncate", True) \
    .option("url", url) \
    .option("dbtable", "dbo.nyc_yellow_trip_test_2017") \
    .option("user", username) \
    .option("password", password) \
    .save()
except ValueError as error :
    print("Connector write failed", error)

Issue loading big data using Apache Spark Connector for SQL Server to Azure SQL

Answers (1)

Related Questions