Sulteric
Sulteric

Reputation: 527

Looking for an alternate way to validate URLs in Java

I'm using HttpURLConnection to validate URLs coming out of a database. Sometimes with certain URLs I will get an exception, I assume they are timing out but are in fact reachable (no 400 range error).

Increasing the timeout doesn't seem to matter, I still get an exception. Is there a second check I could do in the catch region to verify if in fact the URL is bad? The relevant code is below. It works with 99.9% of URLs, it's that .01%.

try {
    HttpURLConnection connection = (HttpURLConnection) new URL(url).openConnection();
    connection.setConnectTimeout(timeout);
    connection.setReadTimeout(timeout);
    connection.setRequestMethod("GET");
    connection.setRequestProperty("User-Agent",
            "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.13) Gecko/2009073021 Firefox/3.0.13");
    connection.connect () ; 
    int responseCode = connection.getResponseCode();
    if (responseCode >= 401) 
    {
        String prcMessage = "ERROR: URL " + url + " not found, response code was " + responseCode + "\r";
        System.out.println(prcMessage);
        VerifyUrl.writeToFile(prcMessage);
        return (false);
    }
}
catch (IOException exception) 
{
    String errorMessage =  ("ERROR: URL " + url + " did not load in the given time of " + timeout + " milliseconds.");
    System.out.println(errorMessage);
    VerifyUrl.writeToFile(errorMessage);
    return false;
}

Upvotes: 1

Views: 3298

Answers (1)

getjackx
getjackx

Reputation: 345

Depends on what you want to check. But i guess Validating URL in Java got you covered.

You got two possiblities:

  1. Check syntax ("Is this URL a real URL or just made up?")

    There is a large amount of text which describes how to do it. Basically search for RFC 3986. I guess someone has implemented a check like this already.

  2. Check the semantics ("Is the URL available?")

    There is not really a faster way to do that though there are different tools available for sending a http request in java. You may send a HEAD request instead of GET as HEAD omits the HTTP body and may result in faster requests and less timeouts.

Upvotes: 2

Related Questions