mttb12
mttb12

Reputation: 75

How to migrate data between clusters?

I have to duplicate Hive tables to another cluster keeping the schema and the hierarchy of my tables, so my question is : What is the safest and proper way to do it, in order to have the exact tables (and databases) copies of Cluster1 into Cluseter2.

I have found ways which globally said:

 - hive > export TABLE1;
 - distcp hdfs:source_Path hdfs:dest_Path
 - hive > import TABLE1; #in Cluster 2
 - hive> MSCK REPAIR TABLE TABLE1;

But as I have to copy a numerous number of databases and tables, is there any fast and safe way, like duplicate the state or snapshot of Datawarehouse1 into Datawarehouse1 ...etc ?

Thanks in advance.

Upvotes: 2

Views: 2589

Answers (1)

franklinsijo
franklinsijo

Reputation: 18270

Migration of Schema (Assuming hive metastore is stored in MySQL)

Take the dump of the metastore database

mysqldump -u **** -p***** metastoredb > metastore.sql

Replace the Cluster1 FS URI with Cluster2's FS URI

sed -i 's_hdfs://namenode1:port1_hdfs://namenode2:port2_g' metastore.sql

Move the dump to the target cluster and Restore it.

mysql> create database metastoredb;
mysql> use metastoredb;
mysql> source metastore.sql;

If the target Hive is of different version, run the relevant upgrade scripts.

Migration of Warehouse and External Tables has to be done with distcp preserving the directory structure.

hadoop distcp hdfs://namenode1:port1/hive/data hdfs://namenode2:port2/hive/data

Hive's export and import do not have options for databases.

Upvotes: 1

Related Questions