Reputation: 1671
I need to split a text into its lines. But I also need to keep the line break characters at each line.
var text = "abc\r\ndef"; // should be two lines and 8 characters
// var text = "abc\rdef"; // should be two lines and 7 characters
// var text = "abc\ndef"; // should be two lines and 7 characters
var lines = Regex.Split(text, @"(?<=[\r\n|\r|\n])");
// I was hoping it would split into two lines:
// "abc\r\n"
// "def"
var countChars = 0;
foreach (var line in lines)
{
countChars += line.Length;
}
Assert.That(countChars, Is.EqualTo(8));
Assert.That(lines.Length, Is.EqualTo(2));
It did not feel so complicated in the beginning but I cannot make it work. Perhaps someone has a hint?
Upvotes: 2
Views: 966
Reputation: 626728
The problem is that the lookbehind pattern is tried at every position inside the string, and it can find the position the is immediately preceded with \r
, with \n
and with \r\n
in your string. More, [\r\n|\r|\n]
is just the same as [\r|\n]
, it matches a CR, LF or a pipe chars.
If you want to make sure you only match a position immediately preceded with CRLF, or a CR that has no LF after it, or an LF that has no CR before it, you can use
(?<=\r\n|(?<!\r)\n|\r(?!\n))
See the regex demo, it matches:
(?<=
- a positive lookbehind that matches a location that is immediately preceded with
\r\n
- a CRLF sequence|
- or(?<!\r)\n
- an LF not immediately preceded with a CR|
- or\r(?!\n)
- a CR not immediately followed with an LF)
- end of the lookbehind.See the C# demo:
var text = "abc\r\ndef";
foreach (var s in Regex.Split(text, @"(?<=\r\n|(?<!\r)\n|\r(?!\n))"))
Console.WriteLine("'{0}'",s);
Output:
'abc
'
'def'
Upvotes: 3