Reputation: 1103
I have an HDFS soure directory, and a destination archive directory in HDFS. At the beginning of every run of my job, I need to move (or copy, then delete) all the part files present in my Source directory to my Archive directory.
SparkSession spark = SparkSession.builder().getOrCreate();
JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
String hdfsSrcDir = "hdfs://clusterName/my/source";
String archiveDir = "hdfs://clusterName/my/archive";
try{
FileSystem fs = FileSystem.get(new URI(hdfsSrcDir ),jsc.hadoopConfiguration());
}
I don't know how to proceed further. Presently my fs
object has reference to only my source directory.
Creating an fs2
with archive location won't help I believe.
I have found out about FileSystem.rename()
, but that takes filenames as parameters. I need to move /my/source/*
to /my/archive/
.
Upvotes: 1
Views: 1270
Reputation: 1525
Check if this will works for you,
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS", "hdfs://xyz:1234");
FileSystem filesystem = FileSystem.get(configuration);
FileUtil.copy(filesystem, new Path("src/path"),
filesystem, new Path("dst/path"), false, configuration);
filesystem.delete(new Path("src/path"), true);
Upvotes: 1