Andrio
Andrio

Reputation: 2078

Using Jsoup connect() in a loop. The first request is always much slower than all other subsequent ones

I'm creating a small app to measure how long it takes an HTML document to load, checking every x number of seconds.

I'm using jsoup in a loop:

    Connection.Response response = null;

    for (int i = 0; i < totalGets; i++) {
        long startTime = System.currentTimeMillis();

        try {
            response = Jsoup.connect(url)
                    .userAgent(USER_AGENT)  //just using a Firefox user-agent
                    .timeout(30_000)
                    .execute();
        } catch (IOException e) {
            if (e.getMessage().contains("connect timed out")) {
                System.out.println("Request timed out after 30 seconds!");
            }
        }

        long currentTime = System.currentTimeMillis();

        System.out.println("Response time: " + (currentTime - startTime) + "ms" + "\tResponse code: " + response.statusCode());

        sleep(2000);
    }

The issue I'm having is that the very first execution of the jsoup connection is always slower than all subsequent once, no matter what website.

Here is my output on https://www.google.com

Response time: 934ms    Response code: 200
Response time: 149ms    Response code: 200
Response time: 122ms    Response code: 200
Response time: 136ms    Response code: 200
Response time: 128ms    Response code: 200

Here is what I get on http://stackoverflow.com

Response time: 440ms    Response code: 200
Response time: 182ms    Response code: 200
Response time: 187ms    Response code: 200
Response time: 193ms    Response code: 200
Response time: 185ms    Response code: 200

Why is it always faster after the first connect? Is there a better way to determine the document's load speed?

Upvotes: 5

Views: 2511

Answers (3)

Pegasis
Pegasis

Reputation: 1424

Another potential reason is JVM is doing JIT optimization in background which turns java byte code into native instructions to improve speed.

The first time you run the code it's slow because the code haven't been optimized. The following rounds are faster because the optimization is already done.

Parsing html page is a pretty computationally intensive job.

Upvotes: 1

fonkap
fonkap

Reputation: 2509

I think that in addition to @luksch points there is another factor, I think Java is keeping connection alive for a few seconds, maybe saving time in protocol trips.

If you use .header("Connection", "close") you'll see more consistent times.

You can check that connections are kept alive with a sniffer. At least I can see port numbers (I mean source port, of course) reused.

EDIT:

Another thing that may add time to first request is DNS lookup ...

Upvotes: 3

luksch
luksch

Reputation: 11712

1. Jsoup must run some boiler plate code before the first request can be fired. I would not count the first request into your measurements, since all that initialization will skew the first request time.

2. As mentioned in the comments, many websites cache responses for a couple of seconds. Depending on the website you want to measure you can use some tricks to get the webserver to produce a fresh site each time. Such a trick could be to add a timestamp parameter. Usually _ is used for that (like http://url/path/?pameter1=val1&_=ts). Or you could send along no cache headers in the HTTP request. however, none of these tricks can force a webserver to behave the way you want it. So you can wait longer than 30 seconds in between each request.

Upvotes: 3

Related Questions