shakyjake
shakyjake

Reputation: 123

Parsing xml attributes with embedded double-quotes in their values using XMLDocument object

This is a web project. I receive a partial html string from an external source. Using XMLDocument to parse it works well except when it encounters an attribute with embedded quotes such as the "style" attribute below.

<span id="someId" style="font-family:"Calibri", Sans-Serif;">Some Text</span>

It seems as though (but I could be wrong) that LoadXml() thinks that the double-quote before Calibri ends the style attribute and that Calibri is another "token" (token is the term I get in the error message).

var xml = new XmlDocument();
xml.LoadXml(<the html string above, properly escaped>); // <--- here is where I get the error message below

"'Calibri' is an unexpected token. Expecting white space. Line 1, position 18."

I can use Regex to replace the inner quotes but it will be rather ugly. And, I may well end up doing it!

I thought perhaps HtmlAgilityPack would help, but I couldn't find good documentation on it and I would rather avoid 3rd party libraries with sparse documentation.

Is there a way to make LoadXml() accept it (and, subsequently, have the Attributes collection parse it correctly)? I don't have much hope for that, but I am throwing it out there anyways. Or should I be using another class altogether other than XmlDocument? I am open to using a 3rd party library with good documentation.

Upvotes: 1

Views: 1721

Answers (1)

John Saunders
John Saunders

Reputation: 161783

That data is invalid. An attribute quoted with double quotes cannot contain double quotes in the attribute value. An attribute quoted with single quotes cannot have single quotes in the value.

Valid:

<tag attr1="value with 'single' quotes" attr2='value with "double" quotes' />

Invalid:

<tag attr1="value with "double" quotes" attr2='value with 'single' quotes' />

Note that the invalid example can be made valid as follows:

<tag attr1="value with &quot;double&quot; quotes" attr2='value with &apos;single&apos; quotes' />

Upvotes: 5

Related Questions