Reputation: 51

How can I validate a URL in C# to avoid 404 errors?

I need to write a tool that will report broken URL's in C#. The URL should only reports broken if the user see's a 404 Error in the browser. I believe there might be tricks to handle web servers that do URL re-writing. Here's what I have. As you can see only some URL validate incorrectly.

string url = "";

// TEST CASES
//url = "http://newsroom.lds.org/ldsnewsroom/eng/news-releases-stories/local-churches-teach-how-to-plan-for-disasters";   //Prints "BROKEN", although this is getting re-written to good url below.
//url = "http://beta-newsroom.lds.org/article/local-churches-teach-how-to-plan-for-disasters";  // Prints "GOOD"
//url = "http://";     //Prints "BROKEN"
//url = "google.com";     //Prints "BROKEN" althought this should be good.
//url = "www.google.com";     //Prints "BROKEN" althought this should be good.
//url = "http://www.google.com";     //Prints "GOOD"

try
{

    if (url != "")
    {
        WebRequest Irequest = WebRequest.Create(url);
        WebResponse Iresponse = Irequest.GetResponse();
        if (Iresponse != null)
        {
            _txbl.Text = "GOOD";
        }
    }
}
catch (Exception ex)
{
    _txbl.Text = "BROKEN";
}

Upvotes: 5

Answers (3)

Mike Atlas

Reputation: 8241

For one, Irequest and Iresponse shouldn't be named like that. They should just be webRequest and webResponse, or even just request and response. The capital "I" prefix is generally only used for interface naming, not for instance variables.

To do your URL validity checking, use UriBuilder to get a Uri. Then you should use HttpWebRequest and HttpWebResponse so that you can check the strongly typed status code response. Finally, you should be a bit more informative about what was broken.

Here's links to some of the additional .NET stuff I introduced:

Sample:

try
{
    if (!string.IsNullOrEmpty(url))
    {
        UriBuilder uriBuilder = new UriBuilder(url);
        HttpWebRequest request = HttpWebRequest.Create(uriBuilder.Uri);
        HttpWebResponse response = request.GetResponse();
        if (response.StatusCode == HttpStatusCode.NotFound)
        {
            _txbl.Text = "Broken - 404 Not Found";
        }
        if (response.StatusCode == HttpStatusCode.OK)
        {
            _txbl.Text =  "URL appears to be good.";
        }
        else //There are a lot of other status codes you could check for...
        {
            _txbl.Text = string.Format("URL might be ok. Status: {0}.",
                                       response.StatusCode.ToString());
        }
    }
}
catch (Exception ex)
{
    _txbl.Text = string.Format("Broken- Other error: {0}", ex.Message);
}

Upvotes: 8

rkg

Reputation: 5719

Prepend http:// or https:// to the URL and pass it to WebClient.OpenRead method. It would throw an WebException if the URL is malformed.

  private WebClient webClient = new WebClient();

  try {
        Stream strm = webClient.OpenRead(URL);                                   
    }
    catch (WebException we) {
        throw we;
    }

Upvotes: 0

James Hulse

Reputation: 1581

The problem is that most of those 'should be good' cases are actually dealt with at a browser level I believe. If you omit the 'http://' its an invalid request but the browser puts it in for you.

So maybe you could do a similar check that the browser would do:

Ensure there is an 'http://' at the beginning
Ensure there is a 'www.' at the beginning

Upvotes: -1

How can I validate a URL in C# to avoid 404 errors?

Answers (3)

Related Questions