aiman
aiman

Reputation: 1103

Hadoop: How to move HDFS files in one directory to another directory?

I have an HDFS soure directory, and a destination archive directory in HDFS. At the beginning of every run of my job, I need to move (or copy, then delete) all the part files present in my Source directory to my Archive directory.

SparkSession spark = SparkSession.builder().getOrCreate();
JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
String hdfsSrcDir = "hdfs://clusterName/my/source";
String archiveDir = "hdfs://clusterName/my/archive";
try{
    FileSystem fs = FileSystem.get(new URI(hdfsSrcDir ),jsc.hadoopConfiguration());
}

I don't know how to proceed further. Presently my fs object has reference to only my source directory.
Creating an fs2 with archive location won't help I believe.

I have found out about FileSystem.rename(), but that takes filenames as parameters. I need to move /my/source/* to /my/archive/.

Upvotes: 1

Views: 1270

Answers (1)

Ajay Kharade
Ajay Kharade

Reputation: 1525

Check if this will works for you,

Configuration configuration = new Configuration(); 
configuration.set("fs.defaultFS", "hdfs://xyz:1234"); 
FileSystem filesystem = FileSystem.get(configuration); 
FileUtil.copy(filesystem, new Path("src/path"), 
              filesystem, new Path("dst/path"), false, configuration); 
filesystem.delete(new Path("src/path"), true);

Upvotes: 1

Related Questions