Reputation: 702
I am deserializing a large xml doc into a C# object.
I've run into an issue where there are multiple xml elements on the same line, and am having trouble re-constructing them properly in code.
A snippet example as so:
<parent>
<ce:para view="all">
Text <ce:cross-ref refid="123">[1]</ce:cross-ref> More Text <ce:italic>Italicized text</ce:italic> and more text here
</ce:para>
<ce:para>...</ce:para>
</parent>
The generated C# class looks like this
[XmlRoot(ElementName = "para", Namespace = "namespace")]
public class Para
{
[XmlElement(ElementName = "cross-ref", Namespace = "namespace")]
public List<Crossref> Crossref { get; set; }
[XmlText]
public List<string> Text { get; set; }
[XmlElement(ElementName = "italic", Namespace = "namespace")]
public List<Italic> Italic { get; set; }
}
I want to be able to loop over this object and re-construct the sentence as a plain string.
Text [1] More Text Italicized Text and more text here
The only problem is though when the deserialization happens, the order is lost as each bit is stuck into it's respective object. This means I have no way of knowing how to reconstruct the string back to how it is supposed to be.
Text: {"Text", "More Text", "and more text here"}
Crossref: {"[1]"}
Italic: {"Italicized Text"}
I've thought about bringing in the whole element in as a string, and then scrubbing the tags out of it, but I'm not sure how to properly get it deserialized. Or if there is a better way to go about it.
Disclaimer: I am not able to alter the XML document as it is coming in from a 3rd party.
Thanks
Upvotes: 1
Views: 103
Reputation: 702
As per Chris' request, I'm posting my solution. It probably could used refining as I'm not very experienced with linq queries.
XDocument xdoc = xmlAdapter.GetAsXDoc(xmlstring);
IEnumerable<XElement> body = from b in xdoc.Descendants()
where b.Name.LocalName == "body"
select b;
IEnumerable<XElement> sections = from s in body.Descendants()
where s.Name.LocalName == "sections"
select s;
IEnumerable<XElement> paragraphs = from p in sections.Descendants()
where p.Name.LocalName == "para"
select p;
string bodytext = "";
if (paragraphs.Count() > 0)
{
StringBuilder text = new StringBuilder();
foreach (XElement p in paragraphs)
{
text.AppendFormat("{0} ", p.Value);
}
}
bodytext = text.ToString();
Upvotes: 1
Reputation: 34357
Once you have deserialized the 3rd party XML into an object that directly matches the XML's schema (as you have done already in your example above) you should be able to use XmlNode.InnerText() on the <ce:para
node to extract what you're looking for without having to write any parsing code.
At that point, you could do a translation from the object you deserialized into from the raw 3rd party XML into an object which flattens out the <ce:para
node into a simple string.
Upvotes: 1