Reputation: 11629
I noticed that in that case the size of those files is constant (128 bytes in my case with my compression choice). Is there an API or a way to check whether a file doesn't have any content?
Upvotes: 0
Views: 1534
Reputation: 1504
One simple solution to open and check sequence file in HDFS OR S3
hadoop fs -text path_of_sequence_file
Upvotes: 0
Reputation: 30089
Other than opening the file and trying to get the first key/value pair, no there isn't. The reason being is there is no header detailing the number of records in each 'block' (mainly because the data is streamed out, so when the header is written, there is no prior knowledge of the number of keys).
There have been some previous threads about how to avoid creating these 'empty' files, but the only real way of doing it would be to create your own OutputFormat and OutputComitter, that tracks the number of values output, and doesn't commit the file if no data was written.
Upvotes: 2