Overwrite a hive table without downtime

Question

I have a hive table which is associated with an HDFS path. The table is overwritten by a periodic job and has a few downstream consumers. The table gets dropped while being overwritten and if a downstream consumer tries to access this table during this time it throws an error and the job fails. How can I prevent the table from being unavailable.

Here's an approach I tried which doesn't seem to work

Write data to a temporary table (copy of original table)
Get new location of the temporary table
Update original table's location with temporary table's location (spark.sql(s"ALTER TABLE $originalTable SET LOCATION '$tempTableLocation'"))
Run spark.sql(s"MSCK REPAIR TABLE $originalTable")

The location seems to be updated when I run DESCRIBE FORMATTED $originalTable but when I try to load the data from original table it still gets data from the previous path.

How can I fix this?

Overwrite a hive table without downtime

Answers (1)

Related Questions