Reputation: 33118
Let's say I have the following body of text:
Call me Ishmael. Some years ago- never mind how long precisely- having little
or no money in my purse, and nothing particular to interest me on shore, I
thought I would sail about a little and see the watery part of the world. It is
<?xml version="1.0" encoding="utf-8"?>
<RootElement xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<ChildElement />
<ChildElement />
</RootElement>
a way I have of driving off the spleen and regulating the circulation. Whenever
I find myself growing grim about the mouth; whenever it is a damp, drizzly
November in my soul;
What regex could I use that would return to me the XML embedding in the string?
NOTE: I can assume that <RootElement>
and </RootElement>
will always have the same name.
Upvotes: 0
Views: 1082
Reputation: 336468
I understand that the root element will not always be called RootElement
, so you can use
<\?xml[^>]+>\s*<\s*(\w+).*?<\s*/\s*\1>
using RegexOptions.SingleLine
. This will take the first tag name after the opening ´` tag and capture everything until the matching tag.
In C#:
resultString = Regex.Match(subjectString, @"<\?xml[^>]+>\s*<\s*(\w+).*?<\s*/\s*\1>", RegexOptions.Singleline).Value;
Upvotes: 2
Reputation: 888187
If you know that the root element will always be <RootElement ...>
and that there will never be a nested <RootElement>
tag, you can do it like this:
\<\?xml .+?\</RootElement\>
This regex will lazily match all text between <?xml
and </RootElement>
.
Upvotes: 2