Reputation: 1990
I've written a Java program which scrapes some content from a web page. It retrieves the content by calling the readWebPage
method every couple of seconds. The problem I'm having is that only the first read actually works. After the first time I read the web page the InputStream
always appears to be empty (in.ready()
return false
).
Also, conn.getContentLength()
return the same value every time, even though the content on the page has changed. If I restart the program the new content is fetched properly.
What have I missed? Do I have to perform some sort of refresh on the conn
object?
private String readWebpage(HttpURLConnection conn) throws IOException{
conn.connect();
InputStreamReader in = new InputStreamReader((InputStream) conn.getContent());
BufferedReader buffer = new BufferedReader(in);
StringBuilder b = new StringBuilder(conn.getContentLength()+5);
String line;
while ((line=buffer.readLine())!=null){
b.append(line);
}
in.close();
buffer.close();
return b.toString();
}
Upvotes: 0
Views: 1821
Reputation: 27474
Once you've read the entire screen, what more is there to read? A single get or post message cannot result in multiple transmissions from the server. It sends one message back, end of story.
If the screen is still updating, then either (a) the input is not finished, or (b) the further updates are something other than HTML, like there's an applet or a Javascript function that's talking to the server or some such.
I think BufferedReader.readLine blocks as long as there's still input coming, so I don't think it could be (a). If the situation is (b), reading more HTML isn't going to help: that's not changing.
Upvotes: 0
Reputation: 23208
Are you passing in the same HttpURLConnection
object every time? If yes, then since the InputStream
is tied to the underlying HTTP connection, you'll get the same InputStream
every time rather than a new stream to the URL in consideration. Open a new connection (URL#openConnection) before passing it to this method and you should be good to go.
Upvotes: 4