Reputation: 6403
I have the following Regex in C#:
Regex h1Separator = new Regex(@"<h1>(?'name'[\w\d\s]+?)(<br\s?/?>)?</h1>", RegexOptions.Singleline);
Trying to match a string that looks like this:
<h1>test content<br>
</h1>
right now it matches strings that look like the following:
<h1>test content<br></h1>
<h1>test content</h1>
What am I doing wrong? Should I be matching for a newline character? If so, what is it in C#? I can't find one.
Upvotes: 0
Views: 531
Reputation: 11463
Use the Multiline flag. (Edit to address my mispeaking about the .Net platform).
Singleline mode treats the entire string you are passing in as one entry. Therefore ^
and $
represent the entire string and not the beginning and ending of a line within the string. Example <h1>(?'name'[\w\d\s]+?)(<br\s?/?>)?</h1>
will match this:
<h1>test content<br></h1>
Multiline mode changes the meaning of ^
and $
to the beginning and ending of each line within the string (i.e. they will look at every line break).
Regex h1Separator = new Regex(@"<h1>(?'name'[\w\d\s]+?)$(<br\s?/?>)?</h1>", RegexOptions.Multiline);
will match the desired pattern:
<h1>test content<br>
</h1>
In short, you need to tell the regex parser you expect to work with multiple lines. It helps to have a regex designer that speaks your dialect of regex. There are many.
Upvotes: -1
Reputation: 4778
you can either add a dot .
to your string before the ending </h1>
and keep the RegexOptions.Singleline
option, or change it to RegexOptions.Multiline
and add a $
to the regex before the </h1>
. details here
Upvotes: 0
Reputation: 7959
You don't check for whitespace between the end of the br tag and the start of the next tag, so it expects to see the hr tag immediately after. Add a \s* in between to allow that.
Upvotes: 4
Reputation: 16168
You have it defined as a single line regex, see the RegexOptions.Singleline
flag :) use RegexOptions.Multiline
Upvotes: 1
Reputation: 55563
The newline character in C# is: \n
. However, I am not skilled in regex and couldn't tell you what would happen if there was a newline in a regex expression.
Upvotes: 0