Reputation: 43
I have a hive table which is associated with an HDFS path. The table is overwritten by a periodic job and has a few downstream consumers. The table gets dropped while being overwritten and if a downstream consumer tries to access this table during this time it throws an error and the job fails. How can I prevent the table from being unavailable.
Here's an approach I tried which doesn't seem to work
spark.sql(s"ALTER TABLE $originalTable SET LOCATION '$tempTableLocation'"
))spark.sql(s"MSCK REPAIR TABLE $originalTable")
The location seems to be updated when I run DESCRIBE FORMATTED $originalTable
but when I try to load the data from original table it still gets data from the previous path.
How can I fix this?
Upvotes: 0
Views: 54
Reputation: 7387
First option is - i can tweak your process and you can check if it works -
Second option to quickly copy table-to-table is to use import/export feature to copy from any table to any table very fast. But this also works similarly with 1/2 seconds of downtime. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ImportExport
Upvotes: 0