Viktor Dahl
Viktor Dahl

Reputation: 1990

How do I repeatedly read from a HttpURLConnection?

I've written a Java program which scrapes some content from a web page. It retrieves the content by calling the readWebPage method every couple of seconds. The problem I'm having is that only the first read actually works. After the first time I read the web page the InputStream always appears to be empty (in.ready() return false).

Also, conn.getContentLength() return the same value every time, even though the content on the page has changed. If I restart the program the new content is fetched properly.

What have I missed? Do I have to perform some sort of refresh on the conn object?

private String readWebpage(HttpURLConnection conn) throws IOException{
            conn.connect();
            InputStreamReader in = new InputStreamReader((InputStream) conn.getContent());
            BufferedReader buffer = new BufferedReader(in);
            StringBuilder b = new StringBuilder(conn.getContentLength()+5);
            String line;
            while ((line=buffer.readLine())!=null){
                b.append(line);
            }
            in.close();
            buffer.close();
            return b.toString();
    }

Upvotes: 0

Views: 1821

Answers (2)

Jay
Jay

Reputation: 27474

Once you've read the entire screen, what more is there to read? A single get or post message cannot result in multiple transmissions from the server. It sends one message back, end of story.

If the screen is still updating, then either (a) the input is not finished, or (b) the further updates are something other than HTML, like there's an applet or a Javascript function that's talking to the server or some such.

I think BufferedReader.readLine blocks as long as there's still input coming, so I don't think it could be (a). If the situation is (b), reading more HTML isn't going to help: that's not changing.

Upvotes: 0

Sanjay T. Sharma
Sanjay T. Sharma

Reputation: 23208

Are you passing in the same HttpURLConnection object every time? If yes, then since the InputStream is tied to the underlying HTTP connection, you'll get the same InputStream every time rather than a new stream to the URL in consideration. Open a new connection (URL#openConnection) before passing it to this method and you should be good to go.

Upvotes: 4

Related Questions