Reputation: 149
I'm attempting to strip down some XML and get only the value related to a field, however the XML does not use the less than and greater than signs. I try to substring around the field name (in the below case it is Date) and this works fine.
<my:Date xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2014-07-27T23:04:34">2014-08-15</my:Date>
However, I am unable to substring around the less than and greater than. My code is as follows:
public string processReportXML(string field, string xml)
{
try
{
string result = xml.Substring(xml.IndexOf(field));
int resultIndex = result.LastIndexOf(field);
if (resultIndex != -1) result = result.Substring(0, resultIndex);
result = result.Substring(result.IndexOf(">"));
resultIndex = result.IndexOf("<");
if (resultIndex != -1) result = result.Substring(0, resultIndex);
return field + ": " + result.Substring(4) + "\n";
}
catch (Exception e)
{
return field + " failed\n";
}
}
I have tried in a test project and it works fine but I always get the index should be greater than 0 in my actual web service. I have also tried using regex to replace the characters but this also didn't work.
result = Regex.Replace(result, "&(?!(amp|apos|quot|lt|gt);)", "hidoesthiswork?");
Upvotes: 5
Views: 17818
Reputation: 12375
You have HTML-encoded data.
Add this at the beginning of your method for a simple solution:
xml = HttpUtility.HtmlDecode(xml);
You can also use WebUtility.HtmlDecode
if you're using .NET 4.0+ as in this answer
In the long term, you should really be using an XML parser or something like LINQ-XML to access this data. Regexes are not an appropriate tool for this sort of structured data.
Upvotes: 14