Bobby Ortiz
Bobby Ortiz

Reputation: 3147

How to write a Multi-line RegEx Expression

I have a vb.net class that cleans some html before emailing the results.

Here is a sample of some html I need to remove:

    <div class="RemoveThis">
      Blah blah blah<br /> 
      Blah blah blah<br /> 
      Blah blah blah<br /> 
      <br /> 
    </div>

I am already using RegEx to do most of my work now. What would the RegEx expression look like to replace the block above with nothing?

I tried the following, but something is wrong:

'html has all of my text
html = Regex.Replace(html, "<div.*?class=""RemoveThis"">.*?</div>", "", RegexOptions.IgnoreCase)

Thanks.

Upvotes: 2

Views: 3547

Answers (2)

Mark Byers
Mark Byers

Reputation: 839234

Try:

RegexOptions.IgnoreCase Or RegexOptions.Singleline

The RegexOptions.Singleline option changes the meaning of the dot from 'match anything except new line' to 'match anything'.

Also, you should consider using an HTML parser instead of regular expressions if need to parse HTML.

Upvotes: 3

Heinzi
Heinzi

Reputation: 172478

Add the Singleline option:

html = Regex.Replace(html, "<div.*?class=""RemoveThis"">.*?</div>", "", RegexOptions.IgnoreCase Or RegexOptions.Singleline)

From MSDN:

Singleline: Specifies single-line mode. Changes the meaning of the dot (.) so it matches every character (instead of every character except \n).

PS: Parsing HTML with regular expressions is discouraged. Your code will fail on something like this:

<div class="RemoveMe">
    <div>bla</div>
    <div>bla</div>
</div>

Upvotes: 4

Related Questions