Ryan O'Neill
Ryan O'Neill

Reputation: 1790

regex to match specific html tags

I need to match html tags(the whole tag), based on the tag name.

For script tags I have this:

<script.+src=.+(\.js|\.axd).+(</script>|>)

It correctly matches both tags in the following html:

<script src="Scripts/JScript1.js" type="text/javascript" />
<script type="text/javascript" src="Scripts/JScript2.js" />

However, when I do link tags with the following:

<link.+href=.+(\.css).+(</link>|>)

It matches all of this at once(eg it returns one match containing both items):

<link href="Stylesheets/StyleSheet1.css" rel="Stylesheet" type="text/css" />
<link href="Stylesheets/StyleSheet2.css" rel="Stylesheet" type="text/css" />

What am I missing here? The regexes are essentially identical except for the text to match to?

Also, I know that regex is not a great tool for HTML parsing...I will probably end up using the HtmlAgilityPack in the end, but this is driving me nuts and I want an answer if only for my own mental health!

Upvotes: 0

Views: 2787

Answers (2)

Chris
Chris

Reputation: 1753

The .+ wildcards match anything. This:

<link.+href=.+(\.css).+(</link>|>)

Likely matches like this:

<link      => <link
.+         => href="Stylesheets/StyleSheet1.css" rel="Stylesheet" type="text/css" />
              <link 
 href=     => href=
 .+        => "Stylesheets/StyleSheet2
 \.css     => .css
 .+        => " rel="Stylesheet" type="text/css" /
 </link>|> => >

Instead consider using [^>]+ in place of .+. Also, do you really care about the closing tag?

<link[^>]+href=[^>]+(\.css)[^>]+>

Upvotes: 2

Ahmad Mageed
Ahmad Mageed

Reputation: 96477

The problem is your regex is greedy. Whenever you match .+ this is greedy; you need to make it non-greedy by appending a ? to them which makes it match a limited number of characters to satisfy the pattern and not go beyond it to the next matching string.

Change the pattern to this: "<link.+?href=.+?(\.css).+?(</link>|>)"

Then you'll need to use Regex.Matches to get multiple matches and loop over them.

Upvotes: 1

Related Questions