Reputation: 26929
I have a string that has a format like this:
<b>*GTPersonnel</b><table border=1><tr><td>ss2111</td></tr></table>
I want to process those data that is between <td>
tags and replace each occurrence of &#Blah;
with its equal character so for example in s
I want it to be replaced by just the character a
because 115
is the character code for that.
I can loop through the whole string, find &#
index, find ;
index, read the character in between and find the Character for that code... well it is a bit algorithmic... I was wondering if there are better things with .NET that I can utilize for this puprpose.
Upvotes: 1
Views: 138
Reputation: 4049
If you use XHTML
you can simple modify the EntityHandling
property in XmlTextReader
object to tell it to handle character entities automatically:
XmlTextReader reader = new XmlTextReader( "temp.xml" );
reader.EntityHandling = EntityHandling.ExpandCharEntities;
Then you can read your file with XmlTextReader
or with the help of LINQ to XML
. For example, if you have xml file like this:
<?xml version="1.0" encoding="utf-8" ?>
<document>
<td>ss2111</td>
</document>
And you write the code below above and next couple of strings:
while ( reader.Read() )
if ( reader.NodeType == XmlNodeType.Text )
Console.WriteLine( reader.Value );
You get ss2111
value on your console window.
Upvotes: 2
Reputation: 10610
A high-performance, reasonably straightforward way would be to set up a parallel string builder (initalize its length to the same as the original) and keep appending to it from the first with successive IndexOf("#") calls and the appropriate conversions. In this way you're not doing any inserts or deletes, you're not resizing the stringbuilder's backing array (except at the end), and you're reading the first string only forwards. LINQifying it is possible with Aggregate(), but would be more trouble than it's worth and probably less clear.
Upvotes: 1