Reputation: 2374
I have the following two examples of html-
<a href="http://foo.com">User</a>: <a style="color:#333" href="http://foo.com/word"></a> blue elephant ·
<a href="http://foo.com">User</a>: <a style="color:#333" href="http://foo.com/word">@<b>word</b></a> blue elephant ·
I am trying to parse this using C# to put into a csv file and it is working to an extent however, when the html contains the '@' symbol in it, it will either leave the csv cell blank or not include the word with '@' before it. The main part I am trying to get is @word blue elephant
however this is bringing back a blank cell, whereas the first html example brings back blue elephant
as desired.
I am using the following technique to do this-
string[] comm = System.Text.RegularExpressions.Regex.Split(content[1], "<a");
How can I alter this to work for the second html example?
Upvotes: 0
Views: 1567
Reputation: 125538
You want to use a proper HTML parser like the one in HTML agility pack in this situation (and save yourself from invoking the wrath of Cthulhu)
Some examples of how to use it
Upvotes: 6