Reputation: 358
I've noticed that when using ReadLine()
on a StreamReader
or StringReader
, if the file or string ends with a newline, that character sequence is lost entirely. Consider the following example:
static void Main(string[] args)
{
string data = "First Line\r\nSecond Line\r\n\r\n\r\n";
List<string> lineData = new List<string>();
string[] splitData = data.Split(
new string[] { "\r\n" },
StringSplitOptions.None);
using (StringReader sr = new StringReader(data))
{
string line;
while ((line = sr.ReadLine()) != null)
lineData.Add(line);
}
Console.WriteLine("Raw Line Count: " + splitData.Length);
Console.WriteLine("StringReader Line Count: " + lineData.Count);
Console.WriteLine("Split Data: ");
foreach (string s in splitData)
Console.WriteLine(string.IsNullOrEmpty(s) ? "[blank line]" : s);
Console.WriteLine("StringReader Data: ");
foreach (string s in lineData)
Console.WriteLine(string.IsNullOrEmpty(s) ? "[blank line]" : s);
Console.ReadKey();
}
The output is as such:
Raw Line Count: 5
StringReader Line Count: 4
Split Data:
First Line
Second Line
[blank line]
[blank line]
[blank line]
StringReader Data:
First Line
Second Line
[blank line]
[blank line]
Why does the StringReader
/StreamReader
behave this way? I can think of several workarounds, but it seems silly to have to rework my code because the reader behaves in an unexpected way. Is there some setting in some .NET library that will affect the way a stream processes the final newline?
Here's another example: Compare the results when running the example first against "First Line\r\nSecond Line"
and then against "First Line\r\nSecond Line\r\n"
. The results are exactly the same (as far as the StringReader portion of the example is concerned). Why would the StringReader return null
in the second example instead of an empty string? I'm aware that the string returned from ReadLine()
doesn't include the newline, but why would the last line be interpreted as null
instead of ""
?
Upvotes: 5
Views: 1724
Reputation: 35891
The difference in your output is not because a strange behaviour of the StringReader
. Note that your input contains only four lines, and exactly four lines are being read (only without the ending newline tokens, as specified by the documentation). It's the Split method which introduces an extra line - because if you've wanted to keep empty entries a non-existent entry is created after the last token.
Output of StringReader
:
"First Line\r\nSecond Line\r\n\r\n\r\n";
^1st ^2nd ^3rd^4th (line)
Output of Split
:
"First Line\r\nSecond Line\r\n\r\n\r\n";
^1st ^2nd ^3rd^4th^5th (token)
Consider this input:
"First line\r\n"
How many lines is it? One, and that's the output:
Split Data:
First Line
[blank line]
StringReader Data:
First Line
So it seems that it's the Split
that is the "problem" (if there is any) here.
The real problem was described by Douglas in the comments below, and it is that inputs like "ABC\r\nXYZ"
and "ABC\r\nXYZ\r\n"
are indistinguishable. However, in typical use cases for ReadLine
interface you don't care about that. If you want to care, you need to use an interface on a level that is a bit lower (e.g. Read).
Upvotes: 3
Reputation: 101614
Per docs on ReadLine
:
A line is defined as a sequence of characters followed by a line feed ("\n"), a carriage return ("\r"), or a carriage return immediately followed by a line feed ("\r\n"). The string that is returned does not contain the terminating carriage return or line feed. The returned value is null if the end of the input stream is reached.
You're using a method that relies on Environment.NewLine
to tokenize the input stream and return the result. Since those tokens are excluded from the result, it would stand to reason that the expected behavior is what you're seeing.
If you need those characters, you're better off reading the file in chunks (using a standard Read
with a buffer) and break out the content yourself. Alternatively you could create your own implementation of a Stream
that performs the task as you wish.
Upvotes: 3
Reputation: 19407
That is expected behaviour and documented. From - http://msdn.microsoft.com/en-us/library/system.io.stringreader.readline.aspx.
A line is defined as a sequence of characters followed by a line feed ("\n"), a carriage return ("\r"), or a carriage return immediately followed by a line feed ("\r\n"). The string that is returned does not contain the terminating carriage return or line feed. The returned value is null if the end of the string has been reached.
Meaning that the last value returned is null, and it will omit the very last Line Break. If you need to show it in the read data, you can reapply by uisng Environment.NewLine
.
Upvotes: 4