Preserve self-closing tags on extraction

Question

Consider the following example:


   
    test
    test & test 

test

I would like to extract the string enclosed within the second inner element while preserving br tags (or preferably getting them as ), but decoding all the HTML encoded characters. That is, I would like to get:

"test & test 

test"

or

"test & test 

test"

So far I have tried the following. It seems to decode the HTML encoded chars but removes
tags completely.

    XDocument xDoc = XDocument.Load(file);
    XNamespace ns = XNamespace.Get("http://www.w3.org/1999/xhtml");
    var cas = xDoc.Descendants().First(e => e.Name.Equals(ns.GetName("case")));
    foreach (var row in cas.Elements())
    {
        var columnVals = row.Elements(ns.GetName("inner")).Select(e => e.Value);
        string str = columnValues.Skip(1).First();
        // str == "test & test test"
        // but i want:
        // "test & test 

test" or "test & test 

test"
    }

Baldrick · Accepted Answer

Try the following:

XDocument xDoc = XDocument.Load(file);
XNamespace ns = XNamespace.Get("http://www.w3.org/1999/xhtml");
var cas = xDoc.Descendants().First(e => e.Name.Equals(ns.GetName("case")));
foreach (var row in cas.Elements())
{
    var columnVals = row.Elements(ns.GetName("inner")).Select(e => e.Nodes());
    var str = columnVals.Skip(1).First();
    var stringResult = WebUtility.HtmlDecode(string.Join(" ", str));
}

It gets the nodes as strings, but decodes any HTML escaping.

The output is:

test & test  
 
 test

Preserve self-closing tags on extraction

Answers (1)

Related Questions