Kevin
Kevin

Reputation: 3730

loading xml document fails with special character »

I'm consuming an RSS feed and the document contains a special character »

I'm guessing the feed is not encoded properly but I can't change that. I'd like to override that or just replace the offending char with something friendly.

using (Stream stream = response.GetResponseStream())
        {

            using (XmlReader reader = XmlReader.Create(stream))
            {
                try
                {
                    XmlDocument xmlDoc = new XmlDocument();
                    xmlDoc.Load(reader);  //<--- FAILS HERE
                    //parse the items of the feed

...

Upvotes: 2

Views: 4988

Answers (2)

bobince
bobince

Reputation: 536587

+1 what Frédéric said. You can also serve » as a raw unescaped character, presumably encoded in UTF-8.

If it's someone else's RSS feed, you need to kick them to stop producing malformed XML; no XML parser will read this.

In a <description> element, the HTML content should normally be XML-escaped. So if the description of the item is This is a <em>really</em> interesting article, it should appear in the XML as:

<description>This is a &lt;em>really&lt;/em> interesting article</description>

Consequently, an HTML-encoded » character should have come out as

&amp;raquo;

If it was included directly from an HTML source without being escaped, that's a more serious XML-injection problem.

(This is assuming RSS 2.0. In the various earlier versions of RSS, whether the <description> contained HTML or plain text varied from spec to spec and was sometimes completely unspecified. For old RSS versions it's not really reliable to use HTML content at all.)

Upvotes: 1

Fr&#233;d&#233;ric Hamidi
Fr&#233;d&#233;ric Hamidi

Reputation: 263047

&raquo; is an HTML named entity and is not supported in XML. Out of the box, XML only supports &amp;, &apos;, &quot;, &gt; and &lt;.

Use the corresponding numeric entity &#187; (or hexadecimal &#xbb;) instead.

Upvotes: 6

Related Questions