Annatar
Annatar

Reputation: 381

vbScript to PHP Translation: Regular Expression to remove HTML tags

I'm translating a function from Classic ASP (vbscript) into PHP. I've made an attempt but I'm not certain my code is correct so I'd like to ask others.

The vbscript function below contains a regular expression to remove html tags. (The regular expression came from http://regexplib.com). Here's the vbScript code to be translated:

Function StripTags(ByVal szString,ByVal szTags)
If szTags = "" Then szTags = "[a-zA-Z]+"
Dim regEx : Set regEx = New RegExp
regEx.IgnoreCase = True
regEx.Global = True
' tag to remove (based on http://regexplib.com/REDetails.aspx?regexp_id=211)
regEx.Pattern = "</?("+szTags+")(\s+\w+=(\w+|""[^""]*""|'[^']*'))*\s*?/?>"
StripTags = regEx.Replace(szString, "")
Set regEx = Nothing
End Function

I discovered that PHP had a built-in function called strip_tags( $szString). Does this function do the same thing as the code above?

I also found a more complicated PHP HTML removal function on this board, but I'm not sure if it does the same thing:

function StripTags($szString,$szTags){
$szString = preg_replace(
array(
// Remove invisible content
'@<head[^>]*?>.*?</head>@siu', 
'@<style[^>]*?>.*?</style>@siu',
'@<script[^>]*?.*?</script>@siu',
'@<object[^>]*?.*?</object>@siu',
'@<embed[^>]*?.*?</embed>@siu',
'@<applet[^>]*?.*?</applet>@siu',
'@<noframes[^>]*?.*?</noframes>@siu',
'@<noscript[^>]*?.*?</noscript>@siu',
'@<noembed[^>]*?.*?</noembed>@siu',
// Add line breaks before and after blocks
'@</?((address)|(blockquote)|(center)|(del))@iu',
'@</?((div)|(h[1-9])|(ins)|(isindex)|(p)|(pre))@iu',
'@</?((dir)|(dl)|(dt)|(dd)|(li)|(menu)|(ol)|(ul))@iu',
'@</?((table)|(th)|(td)|(caption))@iu',
'@</?((form)|(button)|(fieldset)|(legend)|(input))@iu',
'@</?((label)|(select)|(optgroup)|(option)|(textarea))@iu',
'@</?((frameset)|(frame)|(iframe))@iu',),
array(
' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',
"\n\$0", "\n\$0", "\n\$0", "\n\$0", "\n\$0", "\n\$0",
"\n\$0", "\n\$0",
),
$szString );
$szString = strip_tags( $szString);
return;}

Can somebody tell me if the PHP function above does the same thing as the VBscript function?

Upvotes: 1

Views: 714

Answers (2)

Alex Howansky
Alex Howansky

Reputation: 53533

FWIW, strip_tags() can be adjusted to exclude certain tags by passing an array as the second parameter. That said, you can never always accurately parse HTML with regex, and you're ultimately better off with something like the HTML Tidy extension.

Edit: Ah, here's the other link I was looking for: HTML Purifier

Upvotes: 0

frank
frank

Reputation: 11

i think you could just change the PHP delimeters to allow for ASP VBSCRIPT ones. You might be better off not translating a classic asp page but rather trying to install classic asp support on Apache.

Is there a good reason to make the switch?

Upvotes: 1

Related Questions