Regular Expression for Extracting Script Tags

Question

I am trying to write a regular expression in C# to remove all script tags and anything contained within them.

So far I have come up with the following: \<([^:]*?:)?script\>[^(\)]*?\, however this does not work.

I'll break it up and explain my thinking in each section:

\<([^:]*?:)?script\>

Here I am trying to state that it should get any script element, even if it is prefixed with a namespace, say, . I have also added this to the closing tag.

[^(\)]*?

Here I am trying to state that it should allow anything to be contained within the tags except for , , etc.

Here I am stating that it should have a closing tag.

Can anyone spot where I am going wrong?

Tim Robinson · Accepted Answer

You can't parse HTML with regular expressions.

Use the HTML Agility Pack instead.

Regular Expression for Extracting Script Tags

Answers (2)

But don't do it please

Related Questions