James South
James South

Reputation: 10635

Regex to get all javascript tags c#

I'm looking for a regex that will allow me to get all javscript and css link tags in a string so that I can strip certain tags from a DotNetNuke (Yeah I know.... ouch!) page on an overridden render event.

I know about the html agility pack i've even read Jeff Atwoods blog entry but unfortunately I don't have the luxury of a 3rd party library.

Any help would be appreciated.

Edit, I gave this a try to get a javascript entry but it didn't work. Regex's are a dark art to me.

updatedPageSource = Regex.Replace(
pageSource, 
String.Format("<script type=\"text/javascript\" src=\".*?{0}\"></script>",
 name), "", RegexOptions.IgnoreCase);

Upvotes: 1

Views: 1289

Answers (3)

MarioVW
MarioVW

Reputation: 2514

DISCLAIMER: Regex + HTML = ouch!

Your problem may be that you are not escaping the Regex metacharacters from name (e.g. the dot metacharacter '.'). You may want to try this:

updatedPageSource = Regex.Replace(
    pageSource, 
    String.Format("<script\\s+type=\"text/javascript\"\\s+src=\".*?{0}\"\\s*>\\s*</script>", Regex.Escape(name)),
    "",
    RegexOptions.IgnoreCase);

// Just one of the many reasons why you don't mix Regex with HTML:
updatedPageSource = Regex.Replace(
    updatedPageSource, 
    String.Format("<script\\s+src=\".*?{0}\"\\s+type=\"text/javascript\"\\s*>\\s*</script>", Regex.Escape(name)),
    "",
    RegexOptions.IgnoreCase);

I also added optional whitespace here and there.

Upvotes: 1

Justin Morgan
Justin Morgan

Reputation: 30740

Don't forget to account for things like whitespace, other attributes, different orders of attributes (i.e. src="foo" type="bar" vs type="bar" src="foo"), and " vs ' quoting. Maybe this?

@"<\s*script\b.*?\bsrc=(""|').*?{0}\1\b.*?(/>|>\s*</\s*script\s*>)"

I went ahead and took out the type attribute. If you have the filename, you know what type of script it is anyway; plus, this accounts for tags where the src tag comes first, or they used the deprecated language tag, or they omitted type altogether (it's supposed to be there, but it isn't always). Note that I'm using the lazy .*? so that it doesn't match all the way to the last </script> in the page.

Upvotes: 0

Mitchel Sellers
Mitchel Sellers

Reputation: 63126

I have a few comments on this, your RegEx is close, the following has been tested to work

<script type="text/javascript" src=".*myfile.js"></script>

I used the following test inputs

<script type="text/javascript" src="myfile.js"></script>
<script type="text/javascript" src="/test/myfile.js"></script>
<script type="text/javascript" src="/test/Looky/myfile.js"></script>

However, I would caution on this approach, and it does take time to parse, can be error prone, etc...

Upvotes: 1

Related Questions