Niels
Niels

Reputation: 49929

Multiline regular expression not matching

I have a regular expression to get all the data in between HTML comments. Below is my regex and my HTML part.

Dim rgx As New Regex("<!-- START data-contentid='([0-9]+)' -->((\s|.)*?)<!-- END data-contentid='([0-9]+)' -->", RegexOptions.Multiline Or RegexOptions.IgnoreCase)

This Regex is working, it will return 2 results with the desired groups. The strange part is this:

If I change this: ((\s|.)*?) to this (.*?) my regex stops working, while the . stands for any character.

Any clue in why the OR regex is working but why this DOT regex is not?

<!-- START data-contentid='1151' -->
<div class="dyn-content content" data-contentid="1151">
The content

</div>
<!-- END data-contentid='1151' --><br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
    AABB
<!-- START data-contentid='866' -->
<div class="dyn-content content" data-contentid="866">
    <h1></h1>
    ASBCSDFGGGGGGGGGGGGGGGGGGGGGGGGGG</div>
<!-- END data-contentid='866' -->

Upvotes: 0

Views: 42

Answers (1)

Patrick Hofman
Patrick Hofman

Reputation: 156988

RegexOptions.Multiline makes the regex match line per line.

You meant RegexOptions.SingleLine.

From MSDN:

Multiline mode. Changes the meaning of ^ and $ so they match at the beginning and end, respectively, of any line, and not just the beginning and end of the entire string.

Upvotes: 2

Related Questions