webworm
webworm

Reputation: 11019

Writing file line by line in C# very slow using streamreader/streamwriter

I wrote a Winform application that reads in each line of a text file, does a search and replace using RegEx on the line, and then it writes back out to a new file. I chose the "line by line" method as some of the files are just too large to load into memory.

I am using the BackgroundWorker object so the UI can be updated with the progress of the job. Below is the code (with parts omitted for brevity) that handles the reading and then outputting of the lines in the file.

public void bgWorker_DoWork(object sender, DoWorkEventArgs e)
{
    // Details of obtaining file paths omitted for brevity

    int totalLineCount = File.ReadLines(inputFilePath).Count();

    using (StreamReader sr = new StreamReader(inputFilePath))
    {
      int currentLine = 0;
      String line;
      while ((line = sr.ReadLine()) != null)
      {
        currentLine++;

        // Match and replace contents of the line
        // omitted for brevity

        if (currentLine % 100 == 0)
        {
          int percentComplete = (currentLine * 100 / totalLineCount);
          bgWorker.ReportProgress(percentComplete);
        }

        using (FileStream fs = new FileStream(outputFilePath, FileMode.Append, FileAccess.Write))
        using (StreamWriter sw = new StreamWriter(fs))
        {
          sw.WriteLine(line);
        }
      }
    }
}

Some of the files I am processing are very large (8 GB with 132 million rows). The process takes a very long time (a 2 GB file took about 9 hours to complete). It looks to be working at around 58 KB/sec. Is this expected or should the process be going faster?

Upvotes: 2

Views: 5111

Answers (3)

Scott Chamberlain
Scott Chamberlain

Reputation: 127543

Don't close and re-open the writing file every loop iteration, just open the writer outside the file loop. This should improve performance as the writer no longer needs to seek to the end of the file every single loop iteration.

AlsoFile.ReadLines(inputFilePath).Count(); is causing you to read your input file twice and could be a big chunk of time. Instead of a percentage based off of lines calculate the percentage based off of stream position.

public void bgWorker_DoWork(object sender, DoWorkEventArgs e) 
{ 
    // Details of obtaining file paths omitted for brevity

    using (StreamWriter sw = new StreamWriter(outputFilePath, true)) //You can use this constructor instead of FileStream, it does the same operation.
    using (StreamReader sr = new StreamReader(inputFilePath))
    {
      int lastPercentage = 0;
      String line;
      while ((line = sr.ReadLine()) != null)
      {

        // Match and replace contents of the line
        // omitted for brevity

        //Poisition and length are longs not ints so we need to cast at the end.
        int currentPercentage = (int)(sr.BaseStream.Position * 100L / sr.BaseStream.Length);
        if (lastPercentage != currentPercentage )
        {
          bgWorker.ReportProgress(currentPercentage );
          lastPercentage = currentPercentage;
        }
          sw.WriteLine(line);
      }
    }
}

Other than that you will need to show what Match and replace contents of the line omitted for brevity does as I would guess that is where your slowness comes from. Run a profiler on your code and see where it is taking the most time and focus your efforts there.

Upvotes: 15

Thulani Chivandikwa
Thulani Chivandikwa

Reputation: 3539

Remove the ReadAllLines method at the top as the reads through whole file just to get numberof lines.

Upvotes: 0

Gregory A Beamer
Gregory A Beamer

Reputation: 17010

Follow this process:

  1. Instantiate reader and writer
  2. Loop through lines, doing the next two steps
  3. In loop change line
  4. In loop write changed line
  5. Dispose of reader and writer

This should be a LOT faster than instantiating the writer on each line loop, as you have.

I will append this with a code sample shortly. Looks like someone else beat me to the punch on code samples - see @Scott Chamberlain's answer.

Upvotes: 1

Related Questions