Reputation: 51
I need to write a tool that will report broken URL's in C#. The URL should only reports broken if the user see's a 404 Error in the browser. I believe there might be tricks to handle web servers that do URL re-writing. Here's what I have. As you can see only some URL validate incorrectly.
string url = "";
// TEST CASES
//url = "http://newsroom.lds.org/ldsnewsroom/eng/news-releases-stories/local-churches-teach-how-to-plan-for-disasters"; //Prints "BROKEN", although this is getting re-written to good url below.
//url = "http://beta-newsroom.lds.org/article/local-churches-teach-how-to-plan-for-disasters"; // Prints "GOOD"
//url = "http://"; //Prints "BROKEN"
//url = "google.com"; //Prints "BROKEN" althought this should be good.
//url = "www.google.com"; //Prints "BROKEN" althought this should be good.
//url = "http://www.google.com"; //Prints "GOOD"
try
{
if (url != "")
{
WebRequest Irequest = WebRequest.Create(url);
WebResponse Iresponse = Irequest.GetResponse();
if (Iresponse != null)
{
_txbl.Text = "GOOD";
}
}
}
catch (Exception ex)
{
_txbl.Text = "BROKEN";
}
Upvotes: 5
Views: 16757
Reputation: 8241
For one, Irequest
and Iresponse
shouldn't be named like that. They should just be webRequest
and webResponse
, or even just request
and response
. The capital "I" prefix is generally only used for interface naming, not for instance variables.
To do your URL validity checking, use UriBuilder
to get a Uri
. Then you should use HttpWebRequest
and HttpWebResponse
so that you can check the strongly typed status code response. Finally, you should be a bit more informative about what was broken.
Here's links to some of the additional .NET stuff I introduced:
Sample:
try
{
if (!string.IsNullOrEmpty(url))
{
UriBuilder uriBuilder = new UriBuilder(url);
HttpWebRequest request = HttpWebRequest.Create(uriBuilder.Uri);
HttpWebResponse response = request.GetResponse();
if (response.StatusCode == HttpStatusCode.NotFound)
{
_txbl.Text = "Broken - 404 Not Found";
}
if (response.StatusCode == HttpStatusCode.OK)
{
_txbl.Text = "URL appears to be good.";
}
else //There are a lot of other status codes you could check for...
{
_txbl.Text = string.Format("URL might be ok. Status: {0}.",
response.StatusCode.ToString());
}
}
}
catch (Exception ex)
{
_txbl.Text = string.Format("Broken- Other error: {0}", ex.Message);
}
Upvotes: 8
Reputation: 5719
Prepend http://
or https://
to the URL and pass it to WebClient.OpenRead
method. It would throw an WebException
if the URL is malformed.
private WebClient webClient = new WebClient();
try {
Stream strm = webClient.OpenRead(URL);
}
catch (WebException we) {
throw we;
}
Upvotes: 0
Reputation: 1581
The problem is that most of those 'should be good' cases are actually dealt with at a browser level I believe. If you omit the 'http://' its an invalid request but the browser puts it in for you.
So maybe you could do a similar check that the browser would do:
Upvotes: -1