Supun Wijerathne
Supun Wijerathne

Reputation: 12938

How to replicate data in one Hadoop cluster to another Hadoop cluster?

I am new to Apache Hadoop. We have one Hadoop cluster[1] filled with some data. And there's another Hadoop cluster[2] empty with data. What is the simplest and most preferred way to replicate data from [1] into [2] ?

Upvotes: 0

Views: 2040

Answers (1)

RojoSam
RojoSam

Reputation: 1496

You can use DistCp (Distributed copy), It is a tool to allow you copy data between clusters or from/to a different file system like S3 or FTP server.

https://hadoop.apache.org/docs/r1.2.1/distcp2.html

You must specify the absolute path to copy data from external cluster: hdfs://OtherClusterNN:port/path

This tool launch a MapReduce job that copy data in parallel from any kind of source available in Hadoop FileSystem library like HDFS, FTP, S3, AZURE(in latest versions, etc)

To copy data from different versions of hadoop, instead to use HDFS protocol, you must use HftpFileSystem from one of them.

Upvotes: 5

Related Questions