Reputation: 46641
I have the following string:
<TD><!-- 1.91 -->6949<!-- 9.11 --></TD>
I want to end up with:
<TD>6949/TD>
but instead I end up with just the tags and no information:
<TD></TD>
This is the regular expression I am using:
RegEx.Replace("<TD><!-- 1.91 -->6949<!-- 9.11 --></TD>","<!--.*-->","")
Can someone explain how to keep the numbers and remove just what the comments. Also if possible, can someone explain why this is happening?
Upvotes: 0
Views: 340
Reputation: 76238
Parsing HTML with Regex is always going to be tricky. Instead, use something like HTML Agility Pack which will allow you to query and parse html in a structured manner.
Upvotes: 2
Reputation: 32514
.*
is greedy so it will match as many characters as possible. In this case the opening of the first comment until the end of the second. Changing it to .*?
or [^>]*
will fix it as the ?
makes the match lazy. Which is to say it will match as few characters as possible.
Upvotes: 2
Reputation: 887877
.*
is a greedy qualifier which matches as much as possible.
It's matching everything until the last -->
.
Change it to .*?
, which is a lazy qualifier.
Upvotes: 3