deamon
deamon

Reputation: 92449

how to load first x bytes from URL with Java / Scala?

I want to read the first x bytes from a java.net.URLConnection (although I'm not forced to use this class - other suggestions welcome).

My code looks like this:

val head = new Array[Byte](2000)  
new BufferedInputStream(connection.getInputStream).read(head)
IOUtils.toString(new ByteArrayInputStream(head), charset)

It works, but does this code load only the first 2000 bytes from the network?

Next trial

As 'JB Nizet' said it is not useful to use a buffered input stream, so I tried it with an InputStreamReader:

val head = new Array[Char](2000)  
new InputStreamReader(connection.getInputStream, charset).read(head)
new String(head)

This code may be better, but the load times are about the same. So does this procedure limit the transferred bytes ?

Upvotes: 0

Views: 788

Answers (2)

Petr
Petr

Reputation: 63359

You can use read(Reader, char[]) from Apache Commons IO. Just pass a 2000-character buffer to it and it will fill it with as many characters as possible, up to 2000.

Be sure you understand the objections in the other answers/comments, in particular:

  • Don't use Buffered... wrappers, it goes against your intentions.
  • If you read textual data, then use a Reader to read 2000 characters instead of InputStream reading 2000 bytes. The proper procedure would be to determine the character encoding from the headers of a response (Content-Type) and set that encoding into InputStreamReader.
  • Calling plain read(char[]) on a Reader will not fully fill the array you give to it. It can read as little as one character no matter how big the array is!
  • Don't forget to close the reader afterwards.

Other than that, I'd strongly recommend you to use Apache HttpClient in favor of java.net.URLConnection. It's much more flexible.


Edit: To understand the difference between Reader.read and IOUtils.read, it's worth examining the source of the latter:

public static int read(Reader input, char[] buffer,
                       int offset, int length)
    throws IOException
{
    if (length < 0) {
        throw new IllegalArgumentException("Length must not be negative: " + length);
    }
    int remaining = length;
    while (remaining > 0) {
        int location = length - remaining;
        int count = input.read(buffer, offset + location, remaining);
        if (EOF == count) { // EOF
            break;
        }
        remaining -= count;
    }
    return length - remaining;
}

Since Reader.read can read less characters than a given length (we only know it's at least 1 and at most the length), we need to iterate calling it until we get the amount we want.

Upvotes: 5

JB Nizet
JB Nizet

Reputation: 691765

No, it doesn't. It could read up to 8192 bytes (the deault buffer size of BufferedInputStream). It could also read 0 bytes, or any number of bytes between 0 and 2000, since you don't check the number of bytes that have actually been read, and which is returned by the read() method.

And finally, depending on the value of charset, and of the actual charset used by the HTTP response, this could return an incorrect string, or a String truncated in the middle of a multi-byte character. You should use a Reader to read text.

I suggest you read the Java IO tutorial.

Upvotes: 7

Related Questions