Reputation: 49929
I have a regular expression to get all the data in between HTML comments. Below is my regex and my HTML part.
Dim rgx As New Regex("<!-- START data-contentid='([0-9]+)' -->((\s|.)*?)<!-- END data-contentid='([0-9]+)' -->", RegexOptions.Multiline Or RegexOptions.IgnoreCase)
This Regex is working, it will return 2 results with the desired groups. The strange part is this:
If I change this: ((\s|.)*?)
to this (.*?)
my regex stops working, while the .
stands for any character.
Any clue in why the OR regex is working but why this DOT regex is not?
<!-- START data-contentid='1151' -->
<div class="dyn-content content" data-contentid="1151">
The content
</div>
<!-- END data-contentid='1151' --><br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
AABB
<!-- START data-contentid='866' -->
<div class="dyn-content content" data-contentid="866">
<h1></h1>
ASBCSDFGGGGGGGGGGGGGGGGGGGGGGGGGG</div>
<!-- END data-contentid='866' -->
Upvotes: 0
Views: 42
Reputation: 156988
RegexOptions.Multiline
makes the regex match line per line.
You meant RegexOptions.SingleLine
.
From MSDN:
Multiline mode. Changes the meaning of ^ and $ so they match at the beginning and end, respectively, of any line, and not just the beginning and end of the entire string.
Upvotes: 2