Reputation: 959
I'm trying to make a method that download a webpage. First, i create a HttpURLConnection. Second, i call the connect() method. Third, i read the data through a BufferedReader.
The problem is that with some pages i get reasonable reading times, but with some pages it's Very slow (it can take about 10 minutes!). The slow pages are always the same, and they comes from the same website. Opening those pages with the browser takes just a few seconds instead of 10 minutes. Here is the code
static private String getWebPage(PageNode pagenode)
{
String result;
String inputLine;
URI url;
int cicliLettura=0;
long startTime=0, endTime, openConnTime=0,connTime=0, readTime=0;
try
{
if(Core.logGetWebPage())
startTime=System.nanoTime();
result="";
url=pagenode.getUri();
if(Core.logGetWebPage())
openConnTime=System.nanoTime();
HttpURLConnection yc = (HttpURLConnection) url.toURL().openConnection();
if(url.toURL().getProtocol().equalsIgnoreCase("https"))
yc=(HttpsURLConnection)yc;
yc.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 (.NET CLR 3.5.30729)");
yc.connect();
if(Core.logGetWebPage())
connTime=System.nanoTime();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
while ((inputLine = in.readLine()) != null)
{
result=result+inputLine+"\n";
cicliLettura++;
}
if(Core.logGetWebPage())
readTime=System.nanoTime();
in.close();
yc.disconnect();
if(Core.logGetWebPage())
{
endTime=System.nanoTime();
System.out.println(/*result+*/"getWebPage eseguito in "+(endTime-startTime)/1000000+" ms. Size: "+result.length()+" Response Code="+yc.getResponseCode()+" Protocollo="+url.toURL().getProtocol()+" openConnTime: "+(openConnTime-startTime)/1000000+" connTime:"+(connTime-openConnTime)/1000000+" readTime:"+(readTime-connTime)/1000000+" cicliLettura="+cicliLettura);
}
return result;
}catch(IOException e){
System.out.println("Eccezione: "+e.toString());
e.printStackTrace();
return null;
}
}
Here you have two log samples One of the "normal" pages getWebPage executed Size: 48261 Response Code=200 Protocol=http openConnTime: 0 connTime:1 readTime:569 cicliLettura=359
One of the "slow" pages http://ricette.giallozafferano.it/Pan-di-spagna-al-cacao.html/allcomments looks like this getWebPage executed Size: 1748261 Response Code=200 Protocol=http openConnTime: 0 connTime:1 readTime:596834 cicliLettura=35685
Upvotes: 0
Views: 370
Reputation: 1483
What you're likely seeing here is a result of the way you are collating result
. Remember that String
s in Java are immutable - therefore when string concatenation occurs, a new String
has to be instantiated, which can often involve copying all the data contained in that String
. You have the following code executing for every line:
result=result+inputLine+"\n";
Under the covers, this line involves:
result
so farinputLine
is appended to the StringBuffer
StringBuffer
is converted to a String
StringBuffer
is created for that String
StringBuffer
StringBuffer
is converted to a String
String
is stored as result
.This operation will become more and more time-consuming as result
gets bigger and bigger - and your results appear to show (albeit from a sample of 2!) that the results increase drastically with page size.
Instead, use StringBuffer
directly.
StringBuffer buffer = new StringBuffer();
while ((inputLine = in.readLine()) != null)
{
buffer.append(inputLine).append('\n');
cicliLettura++;
}
String result = buffer.toString();
Upvotes: 1