pierroz
pierroz

Reputation: 7870

Unexpected exception while validating XML code

In .NET/C#, I want to validate some html code. For instance I have the following HTML :

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
  <head><title></title></head>
  <body>
   CDATA section number 1?
  </body>
</html>

I have the following C# code:

string htmlCode = ... // for instance the html above
var settings = new XmlReaderSettings { ValidationType = ValidationType.DTD };
settings.ValidationEventHandler += delegate(object s, ValidationEventArgs e)
{
    throw new XmlException(e.Message);
};
using (var srdr = new StringReader(htmlCode))
using (var xrdr = new XmlTextReader(srdr))
using (var vrdr = XmlReader.Create(xrdr, settings))
{
    try
    {
        while (vrdr.Read()) { }
    }
    catch (XmlException ex)
    {
        // do some stuff
    }
}

when I run this code I have this exception:

System.Net.WebException : The remote server returned an error: (403) Forbidden.
at System.Net.HttpWebRequest.GetResponse()

What's wrong in what I've done? Thanks in advance for your help

Upvotes: 0

Views: 442

Answers (3)

M-Peror
M-Peror

Reputation: 842

It looks like your code is trying to download from http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd

which returns a 403 (try opening it in your browser)

Note: Lucero's link has the explanation as to why it returns 403

Upvotes: 1

Brian Driscoll
Brian Driscoll

Reputation: 19635

The response code you're getting is an HTTP code stating that you are forbidden access to the resource you're trying to retrieve. This could be for a number of reasons:

  1. Server settings - The server may disallow ALL attempts to access the resource. To check for this, try accessing it from a browser. If you get the same error in the browser, then it's likely that your issue is the server configuration.

  2. Blocked user agent - Sometimes only certain user agents are allowed to access a resource. This is done to prevent automated website crawlers from scraping the info in the resource. If the site you're accessing has a robots.txt file there's a chance that your program is being blocked.

  3. Authentication needed - If the server you're accessing requires authentication (such as basic or digest auth) then you need to provide credentials along with your request. Again, this can be checked w/ the browser. If the resource required authentication you should get a popup in the browser requesting user/pass info.

There are probably other reasons you could be getting this code, but these are the first three I could think of off the top of my head.

Upvotes: 0

Lucero
Lucero

Reputation: 60190

It's not your code.

http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic

You need to supply the DTD yourself, for instance by using a custom XmlResolver which returns the DTD from a local resource.

Upvotes: 2

Related Questions