Utkarsh Roy
Utkarsh Roy

Reputation: 43

Overwrite a hive table without downtime

I have a hive table which is associated with an HDFS path. The table is overwritten by a periodic job and has a few downstream consumers. The table gets dropped while being overwritten and if a downstream consumer tries to access this table during this time it throws an error and the job fails. How can I prevent the table from being unavailable.

Here's an approach I tried which doesn't seem to work

  1. Write data to a temporary table (copy of original table)
  2. Get new location of the temporary table
  3. Update original table's location with temporary table's location (spark.sql(s"ALTER TABLE $originalTable SET LOCATION '$tempTableLocation'"))
  4. Run spark.sql(s"MSCK REPAIR TABLE $originalTable")

The location seems to be updated when I run DESCRIBE FORMATTED $originalTable but when I try to load the data from original table it still gets data from the previous path.

How can I fix this?

Upvotes: 0

Views: 54

Answers (1)

Koushik Roy
Koushik Roy

Reputation: 7387

First option is - i can tweak your process and you can check if it works -

  1. Write data to a temporary table (copy of original table)
  2. Drop original table.
  3. Use alter table rename to rename temp table to original table. This is a quick process and shouldn't take more than 1sec. You can try in your scenario.

Second option to quickly copy table-to-table is to use import/export feature to copy from any table to any table very fast. But this also works similarly with 1/2 seconds of downtime. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ImportExport

Upvotes: 0

Related Questions