Reputation: 2078
I'm creating a small app to measure how long it takes an HTML document to load, checking every x number of seconds.
I'm using jsoup in a loop:
Connection.Response response = null;
for (int i = 0; i < totalGets; i++) {
long startTime = System.currentTimeMillis();
try {
response = Jsoup.connect(url)
.userAgent(USER_AGENT) //just using a Firefox user-agent
.timeout(30_000)
.execute();
} catch (IOException e) {
if (e.getMessage().contains("connect timed out")) {
System.out.println("Request timed out after 30 seconds!");
}
}
long currentTime = System.currentTimeMillis();
System.out.println("Response time: " + (currentTime - startTime) + "ms" + "\tResponse code: " + response.statusCode());
sleep(2000);
}
The issue I'm having is that the very first execution of the jsoup connection is always slower than all subsequent once, no matter what website.
Here is my output on https://www.google.com
Response time: 934ms Response code: 200
Response time: 149ms Response code: 200
Response time: 122ms Response code: 200
Response time: 136ms Response code: 200
Response time: 128ms Response code: 200
Here is what I get on http://stackoverflow.com
Response time: 440ms Response code: 200
Response time: 182ms Response code: 200
Response time: 187ms Response code: 200
Response time: 193ms Response code: 200
Response time: 185ms Response code: 200
Why is it always faster after the first connect? Is there a better way to determine the document's load speed?
Upvotes: 5
Views: 2511
Reputation: 1424
Another potential reason is JVM is doing JIT optimization in background which turns java byte code into native instructions to improve speed.
The first time you run the code it's slow because the code haven't been optimized. The following rounds are faster because the optimization is already done.
Parsing html page is a pretty computationally intensive job.
Upvotes: 1
Reputation: 2509
I think that in addition to @luksch points there is another factor, I think Java is keeping connection alive for a few seconds, maybe saving time in protocol trips.
If you use .header("Connection", "close")
you'll see more consistent times.
You can check that connections are kept alive with a sniffer. At least I can see port numbers (I mean source port, of course) reused.
EDIT:
Another thing that may add time to first request is DNS lookup ...
Upvotes: 3
Reputation: 11712
1. Jsoup must run some boiler plate code before the first request can be fired. I would not count the first request into your measurements, since all that initialization will skew the first request time.
2.
As mentioned in the comments, many websites cache responses for a couple of seconds. Depending on the website you want to measure you can use some tricks to get the webserver to produce a fresh site each time. Such a trick could be to add a timestamp parameter. Usually _
is used for that (like http://url/path/?pameter1=val1&_=ts). Or you could send along no cache headers in the HTTP request. however, none of these tricks can force a webserver to behave the way you want it. So you can wait longer than 30 seconds in between each request.
Upvotes: 3