Tono Nam
Tono Nam

Reputation: 36048

StreamReader Read method doesn't read number of chars specified

I have to parse a large file so instead of doing:

 string unparsedFile = myStreamReader.ReadToEnd(); // takes 4 seconds
 parse(unparsedFile); // takes another 4 seconds

I want to take advantage of the first 4 seconds and try to do both things at the same time by doing something like:

        while (true)
        {
            char[] buffer = new char[1024];

            var charsRead = sr.Read(buffer, 0, buffer.Length);

            if (charsRead < 1)
                break;

            if (charsRead != 1024)
            {
                Console.Write("Here");  // debuger stops here several times why?
            }

            addChunkToQueue(buffer); 
        }

here is the image of the debuger: (I added int counter to show on what iteration we read less than 1024 bytes)

enter image description here

Note that there where 643 chars read and not 1024. On the next iteration I get:

enter image description here

I think I should read 1024 bytes all the time until I get to the last iteration where the remeining bytes are less than 1024.

So my question is why will I read "random" number of chars as I iterate throw the while loop?


Edit

I don't know what kind of stream I am dealing with. I Execute a process like:

        ProcessStartInfo psi = new ProcessStartInfo("someExe.exe")
        {
            RedirectStandardError = true,
            RedirectStandardOutput = true,
            UseShellExecute = false,
            CreateNoWindow = true,
        };

        // execute command and return ouput of command
        using (var proc = new Process())
        {
            proc.StartInfo = psi;
            proc.Start();                               

            var output = proc.StandardOutput;  //  <------------- this is where I get the strem

            //if (string.IsNullOrEmpty(output))
            //output = proc.StandardError.ReadToEnd();

            return output;
        }
    }

Upvotes: 3

Views: 2180

Answers (3)

Tisho
Tisho

Reputation: 8482

From the docs: http://msdn.microsoft.com/en-us/library/9kstw824

When using the Read method, it is more efficient to use a buffer that is the same size as the internal buffer of the stream, where the internal buffer is set to your desired block size, and to always read less than the block size. If the size of the internal buffer was unspecified when the stream was constructed, its default size is 4 kilobytes (4096 bytes). If you manipulate the position of the underlying stream after reading data into the buffer, the position of the underlying stream might not match the position of the internal buffer. To reset the internal buffer, call the DiscardBufferedData method; however, this method slows performance and should be called only when absolutely necessary.

So for the return value, the docs says:

The number of characters that have been read, or 0 if at the end of > the stream and no data was read. The number will be less than or equal to the count parameter, depending on whether the data is available within the stream.

Or, to summarize - your buffer and the underlying buffer are not the same size, thus you get partial fill of your buffer, as the underlying one is not being filled up yet.

Upvotes: 2

Wiktor Zychla
Wiktor Zychla

Reputation: 48230

It depends on the actual stream you are reading. If this is the file stream I guess it is rather unlikely to get "partial" data. However, if you read from a network stream, you have to expect the data to come in chunks of different length.

Upvotes: 3

Jon Skeet
Jon Skeet

Reputation: 1500525

For one thing, you're reading characters, not bytes. There's a huge difference.

As for why it doesn't necessarily read everything all at once: maybe there isn't that much data available, and StreamReader has decided to give you what it's got rather than blocking for an indeterminate amount of time to fill your buffer. It's entirely within its rights to do so.

Is this coming from a local file, or over the network? Normally local file operations are much more likely to fill the buffer than network downloads, but either way you simply shouldn't rely on the buffer being filled. If it's a "file" (i.e. read using FileStream) but it happens to be sitting on a network share... well, that's a grey area in my knowledge :) It's a stream - treat it that way.

Upvotes: 4

Related Questions