User12345
User12345

Reputation: 5480

copy data from one HDFS directory to another continuously

I have a directory in hdfs which gets files populated every 2 days. I want to copy all the files in this directory to another in such a way that if a new file comes in today, I want the file to be copied to the duplicate directory.

How can we do that in Hdfs.

I know we can do that in linux using rsync. Is there any method like this in Hdfs as well?

Upvotes: 1

Views: 4993

Answers (1)

franklinsijo
franklinsijo

Reputation: 18270

No, there are no file sync methods available with HDFS. You have to either do hdfs dfs -cp or hadoop distcp manually or through any scheduler (cron).

If the number of files are more, distcp is preferred.

hadoop distcp -update <src_dir> <dest_dir>

The -update flag would overwrite if source and destination differ in size, blocksize, or checksum.

Upvotes: 3

Related Questions