Reputation: 10635
I'm looking for a regex that will allow me to get all javscript and css link tags in a string so that I can strip certain tags from a DotNetNuke (Yeah I know.... ouch!) page on an overridden render event.
I know about the html agility pack i've even read Jeff Atwoods blog entry but unfortunately I don't have the luxury of a 3rd party library.
Any help would be appreciated.
Edit, I gave this a try to get a javascript entry but it didn't work. Regex's are a dark art to me.
updatedPageSource = Regex.Replace(
pageSource,
String.Format("<script type=\"text/javascript\" src=\".*?{0}\"></script>",
name), "", RegexOptions.IgnoreCase);
Upvotes: 1
Views: 1289
Reputation: 2514
DISCLAIMER: Regex + HTML = ouch!
Your problem may be that you are not escaping the Regex metacharacters from name
(e.g. the dot metacharacter '.'). You may want to try this:
updatedPageSource = Regex.Replace(
pageSource,
String.Format("<script\\s+type=\"text/javascript\"\\s+src=\".*?{0}\"\\s*>\\s*</script>", Regex.Escape(name)),
"",
RegexOptions.IgnoreCase);
// Just one of the many reasons why you don't mix Regex with HTML:
updatedPageSource = Regex.Replace(
updatedPageSource,
String.Format("<script\\s+src=\".*?{0}\"\\s+type=\"text/javascript\"\\s*>\\s*</script>", Regex.Escape(name)),
"",
RegexOptions.IgnoreCase);
I also added optional whitespace here and there.
Upvotes: 1
Reputation: 30740
Don't forget to account for things like whitespace, other attributes, different orders of attributes (i.e. src="foo" type="bar"
vs type="bar" src="foo"
), and "
vs '
quoting. Maybe this?
@"<\s*script\b.*?\bsrc=(""|').*?{0}\1\b.*?(/>|>\s*</\s*script\s*>)"
I went ahead and took out the type
attribute. If you have the filename, you know what type of script it is anyway; plus, this accounts for tags where the src
tag comes first, or they used the deprecated language
tag, or they omitted type
altogether (it's supposed to be there, but it isn't always). Note that I'm using the lazy .*?
so that it doesn't match all the way to the last </script>
in the page.
Upvotes: 0
Reputation: 63126
I have a few comments on this, your RegEx is close, the following has been tested to work
<script type="text/javascript" src=".*myfile.js"></script>
I used the following test inputs
<script type="text/javascript" src="myfile.js"></script>
<script type="text/javascript" src="/test/myfile.js"></script>
<script type="text/javascript" src="/test/Looky/myfile.js"></script>
However, I would caution on this approach, and it does take time to parse, can be error prone, etc...
Upvotes: 1