JITENDRA
JITENDRA

Reputation: 31

Hive managed table drop doesn't delete files on HDFS. Any solutions?

While deleting managed tables from the hive, its associated files from hdfs are not being removed (on azure-databricks). I am getting the following error:

[Simba]SparkJDBCDriver ERROR processing query/statement. Error Code: 0, SQL state: org.apache.spark.sql.AnalysisException: Can not create the managed table('`schema`.`XXXXX`'). The associated location('dbfs:/user/hive/warehouse/schema.db/XXXXX) already exists

This issue is occurring intermittently. Looking for a solution to this.

Upvotes: 3

Views: 4193

Answers (2)

Anirban
Anirban

Reputation: 83

So sometimes the metadata(schema info of Hive table) itself gets corrupted. So whenever we try to delete/drop the table we get errors as, spark checks for the existance of the table before deleting.

We can avoid that if we use hive clint to drop the table, as it avoids checking the table's existence.

Please refer this wonder databricks documentation

Upvotes: 0

Shaun Ryan
Shaun Ryan

Reputation: 1718

I've started hitting this. It was fine for the last year then something is going on with the storage attachment I think. Perhaps enhancements going on in the back ground that are causing issues (PaaS!) As a safeguard I'm manually deleting the directly path as well dropping the table until I can get a decent explanation of what's going on or get a support call answered.

Use

dbutils.fs.rm("dbfs:/user/hive/warehouse/schema.db/XXXXX", true)

becarefull with that though! Get the path wrong and it could be tragic!

Upvotes: 3

Related Questions