Reputation: 63
I'm trying to load a pyspark dataframe into Azure SQL DB using Apache Spark Connector for SQL Server and Azure SQL in Azure DataBricks Env
[Environment] - Azure DataBricks
[Dataset] - NYC Yellow Taxi Dataset
It works fine for data size around 30M, but for the data sizes around 90M I get the below issue:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 20.0 failed 4 times, most recent failure: Lost task 5.3 in stage 20.0 (TID 381) (10.139.64.7 executor 5): com.microsoft.sqlserver.jdbc.SQLServerException: Database '[database]' on server '[servername]' is not currently available. Please retry the connection later. If the problem persists, contact customer support, and provide them the session tracing ID of [some id]
The code that I use:
try:
df.write \
.format("com.microsoft.sqlserver.jdbc.spark") \
.mode("overwrite") \
.option("truncate", True) \
.option("url", url) \
.option("dbtable", "dbo.nyc_yellow_trip_test_2017") \
.option("user", username) \
.option("password", password) \
.save()
except ValueError as error :
print("Connector write failed", error)
Upvotes: 0
Views: 276
Reputation: 3240
Sometimes that error comes as result of intermittent failures on specific regions.
You can check Resource health in Left vertical panel as shown in below image.
In the cloud environment you'll find that failed and dropped database connections happen periodically. That's partly because you're going through more load balancers compared to the on-premises environment where your web server and database server have a direct physical connection. Also, sometimes when you're dependent on a multi-tenant service you'll see calls to the service get slower or time out because someone else who uses the service is hitting it heavily. In other cases you might be the user who is hitting the service too frequently, and the service deliberately throttles you – denies connections – in order to prevent you from adversely affecting other tenants of the service.
Upvotes: 0