Reputation: 3280
I am working with an API and for some crazy reason the XML being returned has & characters that are not correctly escaped. This has left me in an annoying position. I get an exception when i try to use an XMLDocument
to parse the xml string.
I can use replace to get rid of the characters, but this could lead to issues.
xml = xml.Replace("&", "&").Replace("&", "&");
The problem with this is that there may end up being some escaped values. A node like this will cause the line of code above to get screwed up.
<node>Something & something < annoying</node>
If i replace the & characters with amp; it will break lt;. I cant use the same approach for lt; as i did for the amp as it will mean that it will convert all of the <> brackets that i still need to get escaped.
Here is a node that is giving trouble.
<CompanyName>Fire & Ice</CompanyName>
Upvotes: 2
Views: 655
Reputation: 755
I recommend to you XElement.XElement is useful object.XElement.Value will return string that you want.
using System.Xml.Linq;
XElement y = new XElement("CompanyNames",
new XElement("CompanyName", "Fire & Ice")
);
foreach (var item in y.Elements("CompanyName"))
{
Console.WriteLine(item.Value);
}
Output will be "Fire & Ice"
Upvotes: -1
Reputation: 26213
You can use a similar regex to this related question. This essentialy matches all unescaped ampersands (i.e. it will match &
, but not &something;
).
var xml = @"<node>Something & something < annoying</node>";
var result = Regex.Replace(xml, @"&(?!\w*;)", "&");
// output: <node>Something & something < annoying</node>
Upvotes: 4