Bohn
Bohn

Reputation: 26929

what is a a nice and good performance way to process a string

I have a string that has a format like this: <b>*GTPersonnel</b><table border=1><tr><td>&#115;&#115;&#50;&#49;&#49;&#49;</td></tr></table>

I want to process those data that is between <td> tags and replace each occurrence of &#Blah; with its equal character so for example in &#115; I want it to be replaced by just the character a because 115 is the character code for that.

I can loop through the whole string, find &# index, find ; index, read the character in between and find the Character for that code... well it is a bit algorithmic... I was wondering if there are better things with .NET that I can utilize for this puprpose.

Upvotes: 1

Views: 138

Answers (2)

whyleee
whyleee

Reputation: 4049

If you use XHTML you can simple modify the EntityHandling property in XmlTextReader object to tell it to handle character entities automatically:

XmlTextReader reader = new XmlTextReader( "temp.xml" );
reader.EntityHandling = EntityHandling.ExpandCharEntities;

Then you can read your file with XmlTextReader or with the help of LINQ to XML. For example, if you have xml file like this:

<?xml version="1.0" encoding="utf-8" ?>
<document>
    <td>&#115;&#115;&#50;&#49;&#49;&#49;</td>
</document>

And you write the code below above and next couple of strings:

while ( reader.Read() )
    if ( reader.NodeType == XmlNodeType.Text )
        Console.WriteLine( reader.Value );

You get ss2111 value on your console window.

Upvotes: 2

Mark Sowul
Mark Sowul

Reputation: 10610

A high-performance, reasonably straightforward way would be to set up a parallel string builder (initalize its length to the same as the original) and keep appending to it from the first with successive IndexOf("#") calls and the appropriate conversions. In this way you're not doing any inserts or deletes, you're not resizing the stringbuilder's backing array (except at the end), and you're reading the first string only forwards. LINQifying it is possible with Aggregate(), but would be more trouble than it's worth and probably less clear.

Upvotes: 1

Related Questions