Vin
Vin

Reputation: 177

How to check if HDFS directory is empty in Spark

I am using org.apache.hadoop.fs to check if the directory in HDFS is empty or not. I looked up the FileSystem API but I couldn't find anything close to it. Basically I want to check if directory is empty or how many files exists in it.

I was able to find "exists" method but this only tells whether the path exists or not.

val hdfs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
val containsFile = fs.exists(new Path(dataPath))

Upvotes: 0

Views: 4836

Answers (3)

Atais
Atais

Reputation: 11275

Copy-paste solution

FileSystem.get(sc.hadoopConfiguration()).listFiles(path, true).hasNext()

true is not empty, false is empty

Upvotes: 0

Mukesh
Mukesh

Reputation: 227

You can get ContentSummary and check count of files or directories

ContentSummary cs = fileSystem.getContentSummary("path");
long fileCount = cs.getFileCount();

Upvotes: 2

rsantiago
rsantiago

Reputation: 2099

I would apply:

  1. listFiles() from the FileSytem class, e.g.:

    FileSystem.get(sc.hadoopConfiguration()).listFiles(..., true)

  2. Ask if there are elements with the method hasNext() from the object returned RemoteIterator.

Upvotes: 1

Related Questions