Reputation: 1082
I'm parsing html table in c# using Html Agility Pack that contains non-breaking space.
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(page);
Where page is string containing table with special characters  
within text.
<td> test</td>
<td>number = 123 </td>
Using SelectSingleNode(".//td").InnerText
will contains this special characters but i want to ignore them.
Is there some elegant way to ignore this (with or without help of Html Agility Pack) without modifying source table?
Upvotes: 1
Views: 2197
Reputation: 35643
The "Special Character" non-breaking-space of which you speak is a valid character which can perfectly legitimately appear in text, just as "fancy quotes", em-dash etc can.
Often we want to treat certain characters as being equivalent.
However this is not something HTML Agility pack can help with. You need to use something like string.Replace or your own canonicalization function to do this.
I would suggest something like:
static string CleanupStringForMyApp(string s){
// replace characters with their equivalents
s = s.Replace(string.FromCharCode(160), " ");
// Add any more replacements you want to do here
return s;
}
Upvotes: 0
Reputation: 14618
You could use HtmlDecode
string foo = HttpUtility.HtmlDecode("Special char:  ");
Will give you a string:
Special char:
Upvotes: 2