Why Hadoop API FSDataInputStream read less than buffer size?

Question

I have a binary file on hadoop distributed file system that i want to read . I am using FSDataInputStream ( which extends DataInputStream ) . I have buffer of length "len" . I use readBytes = stream.read(buffer) method to read "len" number of bytes from file into buffer. BUT Actual number of bytes read ( readBytes ) are less than buffer size ( len ), even though I know that there are "len" number of bytes present in file. So why does FSDataInputStream read less number of bytes than i ask it to read? Any IDEA?

matt b · Accepted Answer

The JavaDocs for DataInputStream.read(byte[]) and InputStream(byte[]) state pretty clearly that the method will read "some number of bytes" up to the length of the byte array. There are several reasons why the code might return before the byte array is filled.

You shouldn't be calling the read(byte[]) method just once to consume bytes from a stream - you need to loop and continue reading from the stream until it returns -1.

Why Hadoop API FSDataInputStream read less than buffer size?

Answers (2)

Related Questions