Michael Goldshteyn
Michael Goldshteyn

Reputation: 74410

C# multiple text file processing

Let's say that you want to write an application that processes multiple text files, supplied as arguments at the command line (e.g., MyProcessor file1 file2 ...). This is a very common task for which Perl is often used, but what if one wanted to take advantage of .NET directly and use C#.

What is the simplest C# 4.0 application boiler plate code that allows you to do this? It should include basically line by line processing of each line from each file and doing something with that line, by either calling a function to process it or maybe there's a better way to do this sort of "group" line processing (e.g., LINQ or some other method).

Upvotes: 1

Views: 1854

Answers (3)

Michael Goldshteyn
Michael Goldshteyn

Reputation: 74410

After much experimenting, changing this line in Darin Dimitrov's answer:

using (var stream = File.OpenRead(file))

to:

using (var stream=new FileStream(file,System.IO.FileMode.Open,
                                 System.IO.FileAccess.Read,
                                 System.IO.FileShare.ReadWrite,
                                 65536))

to change the read buffer size from the 4KB default to 64KB can shave as much as 10% off of the file read time when read "line at a time" via a stream reader, especially if the text file is large. Larger buffer sizes do not seem to improve performance further.

This improvement is present, even when reading from a relatively fast SSD. The savings are even more substantial if an ordinary HD is used. Interestingly, you get this significant performance improvement even if the file is already cached by the (Windows 7 / 2008R2) OS, which is somewhat counterintuitive.

Upvotes: 2

Darin Dimitrov
Darin Dimitrov

Reputation: 1039238

You could process files in parallel by reading each line and passing it to a processing function:

class Program
{
    static void Main(string[] args)
    {
        Parallel.ForEach(args, file =>
        {
            using (var stream = File.OpenRead(file))
            using (var reader = new StreamReader(stream))
            {
                string line;
                while ((line = reader.ReadLine()) != null) 
                {
                    ProcessLine(line);
                }
            }
        });
    }

    static void ProcessLine(string line)
    {
        // TODO: process the line
    }
}

Now simply call : SomeApp.exe file1 file2 file3

Pros of this approach:

  • Files are processed in parallel => taking advantage of multiple CPU cores
  • Files are read line by line and only the current line is kept into memory which reduces memory consumption and allows you to work with big files

Upvotes: 9

user448374
user448374

Reputation:

Simple;



foreach(var f in args)
{
   var filecontent = File.ReadToEnd();
   //Logic goes here
}

Upvotes: 2

Related Questions