MarkB42
MarkB42

Reputation: 725

treating \r as \n in c# regex

I have a c# function that finds patters of text in side an input and does some processing. (I am using 3.5 version of the .net framework)

public void func(string s)
{
    Regex r = new Regex("^\s*Pattern\s*$", RegexOptions.Multiline | RegexOptions.ExplicitCapture );
    Match m = r.Match(s);
    //Do something with m
}

A use of the function might look like this

string s = "Pattern \n Pattern \n non-Pattern";
func(s);

However, I am finding that sometimes my input is looking more like this

string s = "Pattern \r Pattern \r non-Pattern"
func(s);

And it is not being matched. Is there a way to have \r be treated like a \n within the regex? I figure I could always just replace all \rs with \ns, but I was hoping I could minimize operations if I could just get the regex do it all at once.

Upvotes: 5

Views: 3689

Answers (3)

Scott Chamberlain
Scott Chamberlain

Reputation: 127543

Unfortunatly, when I have run in to similar situations the only situation I found that works is I just do two passes with the regex (like you where hoping to avoid), the first one normalizes the line endings then the 2nd one can do the search as normal, there is no way to get Multiline to trigger on just /r that I could find.

public void func(string s)
{
    s = Regex.Replace(s, @"(\r\n|\n\r|\n|\r)", "\r\n");
    Regex r = new Regex("^\s*Pattern\s*$", RegexOptions.Multiline | RegexOptions.ExplicitCapture );
    Match m = r.Match(s);
    //Do something with m
}

Upvotes: 2

nhahtdh
nhahtdh

Reputation: 56809

According to the documentation Anchors in Regular Expression:

  • ^ in Multiline mode will match the beginning of input string, or the start of the line (as defined by \n).
  • $ in Multiline mode will match the end of input string, or just before \n.

If your purpose is to redefine the anchors to define a line with both \r and \n, then you have to simulate it with look-ahead and look-behind.

  • ^ should be simulated with (?<=\A|[\r\n])
  • $ should be simulated with (?=\Z|[\r\n])

Note that the simulation above will consider \r\n to have 3 starts of line and 3 ends of line. 1 start of line and 1 end of line are defined by start and end of the string. The other 2 starts of line and 2 ends of line are defined by \r and \n.

Upvotes: 2

Josh
Josh

Reputation: 277

You can match either /n or /r if you place them in an character set

[\n\r]

that will match one of either \n or \r characters

Upvotes: 1

Related Questions