Reputation: 861
Consider the following example:
<case>
<outer>
<inner>test</inner>
<inner>test & test <br /><br />test</inner>
<inner></inner>
</outer>
</case>
I would like to extract the string enclosed within the second inner element while preserving br tags (or preferably getting them as \n), but decoding all the HTML encoded characters. That is, I would like to get:
"test & test \n\ntest"
or
"test & test <br /><br />test"
So far I have tried the following. It seems to decode the HTML encoded chars but removes
tags completely.
XDocument xDoc = XDocument.Load(file);
XNamespace ns = XNamespace.Get("http://www.w3.org/1999/xhtml");
var cas = xDoc.Descendants().First(e => e.Name.Equals(ns.GetName("case")));
foreach (var row in cas.Elements())
{
var columnVals = row.Elements(ns.GetName("inner")).Select(e => e.Value);
string str = columnValues.Skip(1).First();
// str == "test & test test"
// but i want:
// "test & test \n\ntest" or "test & test <br /><br />test"
}
Upvotes: 0
Views: 187
Reputation: 11840
Try the following:
XDocument xDoc = XDocument.Load(file);
XNamespace ns = XNamespace.Get("http://www.w3.org/1999/xhtml");
var cas = xDoc.Descendants().First(e => e.Name.Equals(ns.GetName("case")));
foreach (var row in cas.Elements())
{
var columnVals = row.Elements(ns.GetName("inner")).Select(e => e.Nodes());
var str = columnVals.Skip(1).First();
var stringResult = WebUtility.HtmlDecode(string.Join(" ", str));
}
It gets the nodes as strings, but decodes any HTML escaping.
The output is:
test & test <br /> <br /> test
Upvotes: 1