Reputation: 7870
In .NET/C#, I want to validate some html code. For instance I have the following HTML :
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head><title></title></head>
<body>
CDATA section number 1?
</body>
</html>
I have the following C# code:
string htmlCode = ... // for instance the html above
var settings = new XmlReaderSettings { ValidationType = ValidationType.DTD };
settings.ValidationEventHandler += delegate(object s, ValidationEventArgs e)
{
throw new XmlException(e.Message);
};
using (var srdr = new StringReader(htmlCode))
using (var xrdr = new XmlTextReader(srdr))
using (var vrdr = XmlReader.Create(xrdr, settings))
{
try
{
while (vrdr.Read()) { }
}
catch (XmlException ex)
{
// do some stuff
}
}
when I run this code I have this exception:
System.Net.WebException : The remote server returned an error: (403) Forbidden.
at System.Net.HttpWebRequest.GetResponse()
What's wrong in what I've done? Thanks in advance for your help
Upvotes: 0
Views: 442
Reputation: 842
It looks like your code is trying to download from http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
which returns a 403 (try opening it in your browser)
Note: Lucero's link has the explanation as to why it returns 403
Upvotes: 1
Reputation: 19635
The response code you're getting is an HTTP code stating that you are forbidden access to the resource you're trying to retrieve. This could be for a number of reasons:
Server settings - The server may disallow ALL attempts to access the resource. To check for this, try accessing it from a browser. If you get the same error in the browser, then it's likely that your issue is the server configuration.
Blocked user agent - Sometimes only certain user agents are allowed to access a resource. This is done to prevent automated website crawlers from scraping the info in the resource. If the site you're accessing has a robots.txt file there's a chance that your program is being blocked.
Authentication needed - If the server you're accessing requires authentication (such as basic or digest auth) then you need to provide credentials along with your request. Again, this can be checked w/ the browser. If the resource required authentication you should get a popup in the browser requesting user/pass info.
There are probably other reasons you could be getting this code, but these are the first three I could think of off the top of my head.
Upvotes: 0
Reputation: 60190
It's not your code.
http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic
You need to supply the DTD yourself, for instance by using a custom XmlResolver
which returns the DTD from a local resource.
Upvotes: 2