Reputation: 4583

Extract Xml Element from a larger string

I have a string that starts with an xml element then proceeds with regular text after the Element has ended.

Like so:

<SomeElement SomeAtt="SomeValue"><SomeChild/></SomeElement> More random text.

I want to parse the first part into an XElement and then separate out the following text into a string variable. I have considered just counting anglebrackets, but there is legal XML that would throw me off. I would prefer to use the out-of-the-box parsers. I have tried using XmlReader and XElement.Parse method. I would like them to stop after the element is read instead of throwing exceptions because of the unexpected text after the Xml element. I haven't been able to thus far. XmlReader has a ReadSubtree method, but I couldn't get it to work.

Any ideas?

Edit

Additional Info: The random text may contain angle brackets.
Additional Info: Conceptually, XML may contain xml comments, which may contain non matching brackets. So, it is desirable that the solution account for this in order to be generally applicable, but not necessary in my specific case.

Upvotes: 1

Answers (2)

jdweng

Reputation: 34433

Your XML requires an identification line (normal first line) and only one root node to be valid for the XMLReader to read without errors.

Upvotes: 0

har07

Reputation: 89325

One possible simple approach maybe to wrap the entire string within a root node to make it valid XML and parseable by XElement or XDocument :

var xml = @"<SomeElement SomeAtt=""SomeValue""><SomeChild/></SomeElement> More random text.";
xml = string.Format("<root>{0}</root>", xml);
var doc = XDocument.Parse(xml);
var element = doc.Root.Elements().First();
var trailingString = doc.Root.LastNode;

Console.WriteLine(element.ToString());
Console.WriteLine();
Console.WriteLine(trailingString.ToString());

Console Output:

<SomeElement SomeAtt="SomeValue">
  <SomeChild />
</SomeElement>

 More random text.

Upvotes: 5

Extract Xml Element from a larger string

Answers (2)

Related Questions