Reputation: 31
While deleting managed tables from the hive, its associated files from hdfs are not being removed (on azure-databricks). I am getting the following error:
[Simba]SparkJDBCDriver ERROR processing query/statement. Error Code: 0, SQL state: org.apache.spark.sql.AnalysisException: Can not create the managed table('`schema`.`XXXXX`'). The associated location('dbfs:/user/hive/warehouse/schema.db/XXXXX) already exists
This issue is occurring intermittently. Looking for a solution to this.
Upvotes: 3
Views: 4193
Reputation: 83
So sometimes the metadata(schema info of Hive table) itself gets corrupted. So whenever we try to delete/drop the table we get errors as, spark checks for the existance of the table before deleting.
We can avoid that if we use hive clint to drop the table, as it avoids checking the table's existence.
Please refer this wonder databricks documentation
Upvotes: 0
Reputation: 1718
I've started hitting this. It was fine for the last year then something is going on with the storage attachment I think. Perhaps enhancements going on in the back ground that are causing issues (PaaS!) As a safeguard I'm manually deleting the directly path as well dropping the table until I can get a decent explanation of what's going on or get a support call answered.
Use
dbutils.fs.rm("dbfs:/user/hive/warehouse/schema.db/XXXXX", true)
becarefull with that though! Get the path wrong and it could be tragic!
Upvotes: 3