dragon525
dragon525

Reputation: 1

Java GZIPInputStream.read() function

In the following line, when instream is a GZIPInputStream, I found that the values of c are totally random, either greater or less than 1024. But when instream is a FileInputStream, the returned value is always 1024.

int c;
while ((c = instream.read(buffer, offset, 1024)) != -1)
    System.out.println("Bytes read: " + c);

The input source file size is much more than 1024 bytes. Why is the returned value of GZIPInputStream unpredictable? Shouldn't it always read up to the said value 1024? Thanks!

Upvotes: 0

Views: 324

Answers (2)

brettw
brettw

Reputation: 11114

It's just an artifact of compression. Typically a compressed block in a GZIP (which is variable in size) cannot be read unless the entirety of the block is decompressed.

You are reading blocks:

0           1024           2048           3072           4096...

But if the compressed blocks' boundaries looks like this:

0       892     1201        2104         2924 ...

You're going to get a first read of 892 bytes, then 309 (1201-892), then 903 (2104-1201), etc. This is a slight over-simplification, but not much.

As Miserable Variable commented above, the read should never return MORE than 1024 otherwise that would imply a buffer overrun.

Upvotes: 1

Tassos Bassoukos
Tassos Bassoukos

Reputation: 16142

No, the returned value does not need to be equal to 1024 - consider what should be returned in the case of a a file of size 4 bytes. Always use the returned value for processing. Also, depending on the encoding type, it may be less than what you would expect due to circumstances out of your control (f.e. a network that only provides 512 bytes/sec).

Upvotes: 0

Related Questions