Reputation: 7707
I got below StripHTMLTags function code which work fine in VBSCript, now I want same function to be written C#
Function StripHTMLTags(ByVal sHTML)
Dim objRegExp, sOutput
sHTML = Replace(Replace(Trim(sHTML & ""), "<", "<"), ">", ">") ' ** PREVENT NULL ERRORS **
If Len(sHTML) > 0 Then
Set objRegExp = New RegExp
With objRegExp
.IgnoreCase = True
.Global = True
.Pattern= "<[^>]+>"
' ** REPLACE ALL HTML TAG MATCHES WITH THE EMPTY STRING **
sOutput = .Replace(sHTML, "")
End With
Set objRegExp = Nothing
StripHTMLTags = sOutput
Else
StripHTMLTags = ""
End If
End Function
Please suggest as it is really confusing me.
Upvotes: 1
Views: 833
Reputation: 1884
Have you tried Regex.Replace?
Example:
static string stripHTMLTags1(string html)
{
string pattern = @"<[^>]+>";
var expression = new Regex(pattern);
return expression.Replace(html, String.Empty);
}
static string stripHTMLTags2(string html)
{
// From http://gskinner.com/RegExr/
string pattern = @"</?\w+((\s+\w+(\s*=\s*(?:"".*?""|'.*?'|[^'"">\s]+))?)+\s*|\s*)/?>";
var expression = new Regex(pattern);
return expression.Replace(html, String.Empty);
}
Upvotes: 1
Reputation: 2700
Here are regular expressions to strip tags from HTML input:
Also see this Stack Overflow post which goes into more detail about using C# to strip HTML tags.
Chris.
Upvotes: 0