HiseaSaw
HiseaSaw

Reputation: 165

StreamReader has poor perfomance reading very large files simultaneously

I need to read line by line four very large (>2 Gb) files simultaneously on a C# application. I'm using four different StreamReader objects and their ReadLine() method. Perfomance is seriously affected while reading lines from the four files at the same time, but getting better as far as each one of them reaches the EoF (perf with 4 files < perf with 3 files < perf with 2 files...).

I have this (simplified, assuming only two files for a cleaner example) code:

StreamReader readerOne = new StreamReader(@"C:\temp\file1.txt");
StreamReader readerTwo = new StreamReader(@"C:\temp\file2.txt");

while(readerOne.Peek() >= 0 || readerTwo.Peek() >= 0)
{
    string[] readerOneFields = readerOne.Peek() >= 0 ? 
        readerOne.ReadLine().Split(',') : null;
    string[] readerTwoFields = readerTwo.Peek() >= 0 ? 
        readerTwo.ReadLine().Split(',') : null;

    if (readerOneFields != null && readerTwoFields != null)
    {
        if (readerOneFields[2] == readerTwoFields[2])
        {
            // Do some boring things...
        }
    else if (readerOneFields != null)
    {
        // ...
    }
    else
    {
        // ...
    }
}
readerOne.Close();
readerTwo.Close();

The reason why I have to read those files at the same time is because I need to do some stuff comparing those lines, and afterwards write the results to a new file.

I've read a lot of questions regarding large file reading using StreamReader, but I couldn't find a scenario like the one I have. It's using ReadLine() method the proper way to accomplish that? Is it even the StreamReader the proper class?

UPDATE: things are getting weirder now. Just for testing I've tried to reduce the file sizes to about 10 Mb by deleting lines, leaving only 70K records. Furthermore, I have tried with only two files (instead of four) at the same time. And I'm getting the same poor performance while reading from the two files simultaneously! When one of them reaches EoF, performance gets better. I'm setting a StreamReader buffer size of 50 MB.

Upvotes: 2

Views: 1838

Answers (1)

Hans Passant
Hans Passant

Reputation: 942267

By far the most expensive thing you could ever do with a disk is to force the reader head to move from one track to another. It is a mechanical motion, the typical cost is about 13 milliseconds per track.

You are moving the reader head, constantly having to go back and forth from one file to another. Buffering is required to reduce that cost, in other words reading a lot of data from one file in one gulp. The operating system already does some buffering, it reads a track worth of data from the file. You need more.

Use one of the StreamReader constructors that allows you to specify the buffer size. With files this large, a buffer size of 50 megabytes is appropriate.

Upvotes: 8

Related Questions