David Gouge
David Gouge

Reputation: 83

Keep HTML tags in XML using LINQ to XML

I have an xml file from which I am extracting html using LINQ to XML. This is a sample of the file:

<?xml version="1.0" encoding="utf-8" ?>
<tips>
    <tip id="0">
    This is the first tip.
</tip>
<tip id="1">
    Use <b>Windows Live Writer</b> or <b>Microsoft Word 2007</b> to create and publish content.
</tip>
<tip id="2">
    Enter a <b>url</b> into the box to automatically screenshot and index useful webpages.
</tip>
<tip id="3">
    Invite your <b>colleagues</b> to the site by entering their email addresses.  You can then share the content with them!
</tip>
</tips>

I am using the following query to extract a 'tip' from the file:

Tip tip = (from t in tipsXml.Descendants("tip")
                   where t.Attribute("id").Value == nextTipId.ToString()
                   select new Tip()
                   {
                     TipText= t.Value,
                     TipId = nextTipId
                   }).First();

The problem I have is that the Html elements are being stripped out. I was hoping for something like InnerHtml to use instead of Value, but that doesn't seem to be there.

Any ideas?

Thanks all in advance,

Dave

Upvotes: 5

Views: 3037

Answers (4)

Nunzio Davide Salvo
Nunzio Davide Salvo

Reputation: 1

Just use:

string.Concat(element.Nodes()) 

to get the content with HTML tags.

Upvotes: 0

Jon Skeet
Jon Skeet

Reputation: 1503509

Call t.ToString() instead of Value. That will return the XML as a string. You may want to use the overload taking SaveOptions to disable formatting. I can't check right now, but I suspect it will include the element tag (and elements) so you would need to strip this off.

Note that if your HTML isn't valid XML, you will end up with an invalid overall XML file.

Is the format of the XML file completely out of your control? It would be nicer for any HTML inside to be XML-encoded.

EDIT: One way of avoiding getting the outer part might be to do something like this (in a separate method called from your query, of course):

StringBuilder builder = new StringBuilder();
foreach (XNode node in element.Nodes())
{
    builder.Append(node.ToString());
}

That way you'll get HTML elements with their descendants and interspersed text nodes. Basically it's the equivalent of InnerXml, I strongly suspect.

Upvotes: 8

Vijay kumar EK
Vijay kumar EK

Reputation: 11

Just use string.Concat(tip.Nodes()) to get the content with html tags

Upvotes: 1

bobince
bobince

Reputation: 536715

TipText= t.Value,

XElement.value returns only the text that is directly inside the element. Text in nested elements - HTML or otherwise - will not be included, and of course any &-entity-references will appear in their decoded form.

If you want the content as a string with markup you could call XElement.ToString(), possibly with SaveOptions.DisableFormatting. But note this includes the wrapping < tip> element - that is, in web browser DOM terms, it's the outerHTML not the innerHTML. To get the innerHTML you would have to join together all the ToString()s of the child XElement.Nodes.

Upvotes: 0

Related Questions