Reputation: 5480
I have a directory in hdfs which gets files populated every 2 days. I want to copy all the files in this directory to another in such a way that if a new file comes in today, I want the file to be copied to the duplicate directory.
How can we do that in Hdfs.
I know we can do that in linux using rsync. Is there any method like this in Hdfs as well?
Upvotes: 1
Views: 4993
Reputation: 18270
No, there are no file sync methods available with HDFS. You have to either do hdfs dfs -cp
or hadoop distcp
manually or through any scheduler (cron
).
If the number of files are more, distcp
is preferred.
hadoop distcp -update <src_dir> <dest_dir>
The -update
flag would overwrite if source and destination differ in size, blocksize, or checksum.
Upvotes: 3