Reputation: 75
I have to duplicate Hive tables to another cluster keeping the schema and the hierarchy of my tables, so my question is : What is the safest and proper way to do it, in order to have the exact tables (and databases) copies of Cluster1 into Cluseter2.
I have found ways which globally said:
- hive > export TABLE1;
- distcp hdfs:source_Path hdfs:dest_Path
- hive > import TABLE1; #in Cluster 2
- hive> MSCK REPAIR TABLE TABLE1;
But as I have to copy a numerous number of databases and tables, is there any fast and safe way, like duplicate the state or snapshot of Datawarehouse1 into Datawarehouse1 ...etc ?
Thanks in advance.
Upvotes: 2
Views: 2589
Reputation: 18270
Migration of Schema (Assuming hive metastore is stored in MySQL)
Take the dump of the metastore database
mysqldump -u **** -p***** metastoredb > metastore.sql
Replace the Cluster1 FS URI with Cluster2's FS URI
sed -i 's_hdfs://namenode1:port1_hdfs://namenode2:port2_g' metastore.sql
Move the dump to the target cluster and Restore it.
mysql> create database metastoredb;
mysql> use metastoredb;
mysql> source metastore.sql;
If the target Hive is of different version, run the relevant upgrade scripts.
Migration of Warehouse and External Tables has to be done with distcp
preserving the directory structure.
hadoop distcp hdfs://namenode1:port1/hive/data hdfs://namenode2:port2/hive/data
Hive's export
and import
do not have options for databases.
Upvotes: 1