Brian Hicks
Brian Hicks

Reputation: 6403

Why is my C# Regular Expression not matcing between lines?

I have the following Regex in C#:

Regex h1Separator = new Regex(@"<h1>(?'name'[\w\d\s]+?)(<br\s?/?>)?</h1>", RegexOptions.Singleline);

Trying to match a string that looks like this:

<h1>test content<br>
</h1>

right now it matches strings that look like the following:

<h1>test content<br></h1>
<h1>test content</h1>

What am I doing wrong? Should I be matching for a newline character? If so, what is it in C#? I can't find one.

Upvotes: 0

Views: 531

Answers (5)

Berin Loritsch
Berin Loritsch

Reputation: 11463

Use the Multiline flag. (Edit to address my mispeaking about the .Net platform).

Singleline mode treats the entire string you are passing in as one entry. Therefore ^ and $ represent the entire string and not the beginning and ending of a line within the string. Example <h1>(?'name'[\w\d\s]+?)(<br\s?/?>)?</h1> will match this:

<h1>test content<br></h1> 

Multiline mode changes the meaning of ^ and $ to the beginning and ending of each line within the string (i.e. they will look at every line break).

Regex h1Separator = new Regex(@"<h1>(?'name'[\w\d\s]+?)$(<br\s?/?>)?</h1>", RegexOptions.Multiline); 

will match the desired pattern:

<h1>test content<br> 
</h1> 

In short, you need to tell the regex parser you expect to work with multiple lines. It helps to have a regex designer that speaks your dialect of regex. There are many.

Upvotes: -1

vlad
vlad

Reputation: 4778

you can either add a dot . to your string before the ending </h1> and keep the RegexOptions.Singleline option, or change it to RegexOptions.Multiline and add a $ to the regex before the </h1>. details here

Upvotes: 0

MikeP
MikeP

Reputation: 7959

You don't check for whitespace between the end of the br tag and the start of the next tag, so it expects to see the hr tag immediately after. Add a \s* in between to allow that.

Upvotes: 4

nothrow
nothrow

Reputation: 16168

You have it defined as a single line regex, see the RegexOptions.Singleline flag :) use RegexOptions.Multiline

Upvotes: 1

Richard J. Ross III
Richard J. Ross III

Reputation: 55563

The newline character in C# is: \n. However, I am not skilled in regex and couldn't tell you what would happen if there was a newline in a regex expression.

Upvotes: 0

Related Questions