Reputation: 4126
I want to strip html string for Span tags.
I have a html string :
<a href=\"http://www.dr.dk/roskilde\"><span>Roskilde</span><span>Festival</span></a>
I need to strip it down to : Roskilde Festival.
Atm, I have a regex string which should be able to find all span tags, but its failing
System.Collections.Specialized.StringCollection sc = new System.Collections.Specialized.StringCollection();
sc.Add(@"/<\s*\/?\s*span\s*.*?>/g");
foreach (string s in sc)
{
k = System.Text.RegularExpressions.Regex.Replace(pContent, s, "", System.Text.RegularExpressions.RegexOptions.IgnoreCase);
}
k = System.Text.RegularExpressions.Regex.Replace(pContent, @" ", @" ");
Any Ideas?
P.S. I don't wnat to use Html Agility Pack
Upvotes: 0
Views: 2144
Reputation: 77505
Regexp are not the best way to process HTML. Use a HTML parser that understands nesting, because Regexp do not understand HTML nesting.
Consider looking at inverse charsets, i.e. <whatever[^>]*>
And I guess you copied this from somewhere, but your regexp probably is not the proper C# syntax (extra /
and /g
). Reread a regexp in C# tutorial! Try this string:
Example /<span>/g does this tag get removed?
What you probably meant to use was:
sc.Add(@"</?span( [^>]*|/)?>");
Upvotes: 3