kee
kee

Reputation: 11629

how to check whether a hadoop sequence file is empty or not

I noticed that in that case the size of those files is constant (128 bytes in my case with my compression choice). Is there an API or a way to check whether a file doesn't have any content?

Upvotes: 0

Views: 1534

Answers (2)

Prasad D
Prasad D

Reputation: 1504

One simple solution to open and check sequence file in HDFS OR S3

hadoop fs -text path_of_sequence_file

Upvotes: 0

Chris White
Chris White

Reputation: 30089

Other than opening the file and trying to get the first key/value pair, no there isn't. The reason being is there is no header detailing the number of records in each 'block' (mainly because the data is streamed out, so when the header is written, there is no prior knowledge of the number of keys).

There have been some previous threads about how to avoid creating these 'empty' files, but the only real way of doing it would be to create your own OutputFormat and OutputComitter, that tracks the number of values output, and doesn't commit the file if no data was written.

Upvotes: 2

Related Questions