raffad
raffad

Reputation: 21

Job aborted when writing table using different cluster on Databricks

I have two clusters on databricks and i used one (cluster1) to write a table on the datalake. I need to use the other cluster (cluster2) to schedule the job in charge of writing this table. However, this error occurs:

Py4JJavaError: An error occurred while calling o344.saveAsTable.
: org.apache.spark.SparkException: Job aborted.

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
in stage 3740.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3740.0 (TID 
113976, 10.246.144.215, executor 13): org.apache.hadoop.security.AccessControlException: 
CREATE failed with error 0x83090aa2 (Forbidden. ACL verification failed. Either the 
resource does not exist or the user is not authorized to perform the requested operation.). 
[7974c88e-0300-4e1b-8f07-a635ad8637fb] failed with error 0x83090aa2 (Forbidden. 
ACL verification failed. Either the resource does not exist or the user is not authorized 
to perform the requested operation.).

From the "Caused by" message it seems that I do not have the authorization to write on the datalake, but if i change the table name it successfully write the df onto the datalake.

I am trying to write the table with the following command:

df.write \
    .format('delta') \
    .mode('overwrite')\
    .option('path', path)\
    .option('overwriteSchema', "true")\
    .saveAsTable(table_name)

I tried to drop the table and rewriting it using the cluster2 but this doesn't work, as if the location on the datalake is already occupied: only using cluster1 I can write in that location.

In the past I simply changed the table name as a workaround, but this time I need to keep the old name.

How can I solve this? Why the datalake is related to the cluster with which i wrote the table?

Upvotes: 1

Views: 1006

Answers (1)

raffad
raffad

Reputation: 21

The issue was cause by different Service Principals used for the two clusters.

To solve the problem I had to drop the table and remove the path in the datalake with cluster1. Then, I could write the table again using cluster2.

The command to delete the path is:

rm -r 'adl://path/to/table'

Upvotes: 1

Related Questions