Reputation: 177
I am using org.apache.hadoop.fs
to check if the directory in HDFS is empty or not. I looked up the FileSystem API but I couldn't find anything close to it. Basically I want to check if directory is empty or how many files exists in it.
I was able to find "exists" method but this only tells whether the path exists or not.
val hdfs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
val containsFile = fs.exists(new Path(dataPath))
Upvotes: 0
Views: 4836
Reputation: 11275
Copy-paste solution
FileSystem.get(sc.hadoopConfiguration()).listFiles(path, true).hasNext()
true
is not empty, false
is empty
Upvotes: 0
Reputation: 227
You can get ContentSummary and check count of files or directories
ContentSummary cs = fileSystem.getContentSummary("path");
long fileCount = cs.getFileCount();
Upvotes: 2
Reputation: 2099
I would apply:
listFiles() from the FileSytem class, e.g.:
FileSystem.get(sc.hadoopConfiguration()).listFiles(..., true)
Ask if there are elements with the method hasNext() from the object returned RemoteIterator.
Upvotes: 1