Piyush Sharma
Piyush Sharma

Reputation: 35

Copy local files in Hadoop File System

What is the fastest way to copy files in HDFS in programmatic way ? I have tried for DistCp but couldn't get the appropriate content

Upvotes: 0

Views: 4008

Answers (3)

Tariq
Tariq

Reputation: 34184

distcp works perfectly fine for both localfFS to HDFS and HDFS to HDFS copying. However, it doesn't provide us the benefit of high parallelism of MapReduce since the input data resides on localFS(a non-distributes store) and not on HDFS. So, using either of the two will give you almost the same performance, which obviously depends on the hardware and size of input data.

BTW, what do you mean by DistCp but couldn't get the appropriate content?

Upvotes: 2

Jerome Serrano
Jerome Serrano

Reputation: 1855

Distcp is certainly the fastest way to copy large amount of data over HDFS. I would suggest to try first from the command line before calling if from your favorite programming language.

hadoop distcp -p -update "hdfs://A:8020/user/foo/bar" "hdfs://B:8020/user/foo/baz"

-p to preserve status, -update to overwrite data if a file is already present but has a different size.

Since Distcp is written in Java, you shouldn't have any difficulty to call it from a Java application. You can also use your favorite script language (Python, bash, etc.) to run hadoop distcp like any other command line application.

Upvotes: 0

Evgeny Benediktov
Evgeny Benediktov

Reputation: 1399

 FileSystem fs = FileSystem.get(conf);
 fs.copyFromLocalFile(new Path("/home/me/localdirectory/"),   new Path("/me/hadoop/hdfsdir"));

DistCp works only intra-cluster (from hdfs to hdfs).

Upvotes: 0

Related Questions