PepeHands
PepeHands

Reputation: 1406

URLConnection doesn't read whole page

In my app I need to download some web page. I do it in a way like this

URL url = new URL(myUrl);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setReadTimeout(5000000);//5 seconds to download
conn.setConnectTimeout(5000000);//5 seconds to connect
conn.setRequestMethod("GET");
conn.setDoInput(true);

conn.connect();
int response = conn.getResponseCode();
is = conn.getInputStream();

String s = readIt(is, len);
System.out.println("got: " + s);

My readIt function is:

public String readIt(InputStream stream) throws IOException {
    int len = 10000;
    Reader reader;
    reader = new InputStreamReader(stream, "UTF-8");
    char[] buffer = new char[len];
    reader.read(buffer);
    return new String(buffer);
}

The problem is that It doesn't dowload the whole page. For example, if myUrl is "https://wikipedia.org", then the output is enter image description here

How can I download the whole page?

Update Second answer from here Read/convert an InputStream to a String solved my problem. The problem is in readIt function. You should read response from InputStream like this:

static String convertStreamToString(java.io.InputStream is) {
   java.util.Scanner s = new java.util.Scanner(is).useDelimiter("\\A");
   return s.hasNext() ? s.next() : "";
}

Upvotes: 4

Views: 1596

Answers (3)

Stephen C
Stephen C

Reputation: 719436

There are a number of mistakes your code:

  1. You are reading into a character buffer with a fixed size.

  2. You are ignoring the result of the read(char[]) method. It returns the number of characters actually read ... and you need to use that.

  3. You are assuming that read(char[]) will read all of the data. In fact, it is only guaranteed to return at least one character ... or zero to indicate that you have reached the end of stream. When you reach from a network connection, you are liable to only get the data that has already been sent by the other end and buffered locally.

  4. When you create the String from the char[] you are assuming that every position in the character array contains a character from your stream.

There are multiple ways to do it correctly, and this is one way:

public String readIt(InputStream stream) throws IOException {
    Reader reader = new InputStreamReader(stream, "UTF-8");
    char[] buffer = new char[4096];
    StringBuilder builder = new StringBuilder();
    int len;
    while ((len = reader.read(buffer) > 0) {
        builder.append(buffer, 0, len);
    }
    return builder.toString();
}

Another way to do it is to look for an existing 3rd-party library method with a readFully(Reader) method.

Upvotes: 4

George Lee
George Lee

Reputation: 826

You are reading only 10000 bytes from the input stream.

Use a BufferedReader to make your life easier.

public String readIt(InputStream stream) throws IOException {
     BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
     StringBuilder out = new StringBuilder();
     String newLine = System.getProperty("line.separator");
     String line;
     while ((line = reader.readLine()) != null) {
        out.append(line);
        out.append(newLine);
     }
    return out.toString();
}

Upvotes: 0

Argha Sen
Argha Sen

Reputation: 81

You need to read in a loop till there are no more bytes left in the InputStream.

    while (-1 != (len = in.read(buffer))) { //do stuff here}

Upvotes: 0

Related Questions