Reputation: 3147
I have a vb.net class that cleans some html before emailing the results.
Here is a sample of some html I need to remove:
<div class="RemoveThis">
Blah blah blah<br />
Blah blah blah<br />
Blah blah blah<br />
<br />
</div>
I am already using RegEx to do most of my work now. What would the RegEx expression look like to replace the block above with nothing?
I tried the following, but something is wrong:
'html has all of my text
html = Regex.Replace(html, "<div.*?class=""RemoveThis"">.*?</div>", "", RegexOptions.IgnoreCase)
Thanks.
Upvotes: 2
Views: 3547
Reputation: 839234
Try:
RegexOptions.IgnoreCase Or RegexOptions.Singleline
The RegexOptions.Singleline
option changes the meaning of the dot from 'match anything except new line' to 'match anything'.
Also, you should consider using an HTML parser instead of regular expressions if need to parse HTML.
Upvotes: 3
Reputation: 172478
Add the Singleline option:
html = Regex.Replace(html, "<div.*?class=""RemoveThis"">.*?</div>", "", RegexOptions.IgnoreCase Or RegexOptions.Singleline)
From MSDN:
Singleline: Specifies single-line mode. Changes the meaning of the dot (.) so it matches every character (instead of every character except \n).
PS: Parsing HTML with regular expressions is discouraged. Your code will fail on something like this:
<div class="RemoveMe">
<div>bla</div>
<div>bla</div>
</div>
Upvotes: 4