Michael Miller
Michael Miller

Reputation: 33

manipulating text file in c#

I am assigned a project that requires a c# console application that will be used to manipulate a text file. The text file is a bcp table dump. The program should be able to:

  1. Split the file into multiple file(s) based on a column given by the user
  2. Include or exclude the split column from the output

Currently, I am reading in the file as such:

var groupQuery = from name in File.ReadAllLines(fileName)
                                .Skip(skipHeaderRow)
                             let n = name.Split(delimiterChars)
                             group name by n[index] into g
                             // orderby g.Key
                             select g;

I am afraid I might run into memory issues since some of the files can have over 2 million reacords and each row is about 2617 bytes

Upvotes: 3

Views: 1462

Answers (3)

Shreyas Kapur
Shreyas Kapur

Reputation: 679

Try using Buffered Streams to read/write files without completely loading them into memory.

using (FileStream fs = File.Open(inputFile, FileMode.Open, FileAccess.Read, FileShare.ReadWrite)) {
        using (StreamReader sr = new StreamReader(fs)) {
            string line = sr.ReadLine();
            string lineA = null;
            string lineB = null;
            while ((line != null)) {
                // Split your line here into lineA and lineB
                // and write using buffered writer.
                line = sr.ReadLine();
            }
        }
}

(from here)

The idea is to read the file line by line without loading the entire thing in your memory, split it however way you want and then write the splitted lines, line by line into your output files.

Upvotes: 1

displayName
displayName

Reputation: 14359

If you are confident that you program will only need to sequentially access... the bcp dump file, use StreamReader class to read the file. This class is optimized for sequential access and it opens the file as a stream, therefore memory issues should not bother you. Further, you can increase the buffer size of your stream by initializing from a different constructor of this class to have a larger chunk in memory to work with.


If you want to have random access to your file in pieces... go for Memory Mapped Files. make sure to create view accessor over a limited section of the file. The example code given at the link of MMFs explains how to create a small view over a large file.


Edit: I had the code for using MMFs in my answer but I have removed it now as I realized... Even though in reality group by is lazy, it is also a non-streaming LINQ operator. Therefore, it will have to read the entire bcp dump of yours to finally give you the results. This implies:

  1. StreamReader is a clearly a better approach for you. Make sure you increase the buffer to max possible;
  2. Your LINQ will take some time when it will hit the group by operator and will only come back to life after entire file read has been finished.

Upvotes: 2

joelnet
joelnet

Reputation: 14231

Don't reinvent the wheel. Consider using a library like FileHelpers.

http://www.filehelpers.net/example/QuickStart/ReadWriteRecordByRecord/

var engine = new FileHelperAsyncEngine<Customer>();

using(engine.BeginReadFile(fileName))
{
    var groupQuery =
        from o in engine
        group name by o.CustomerId into g
        // orderby g.Key
        select g;   

    foreach(Customer cust in engine)
    {
        Console.WriteLine(cust.Name);
    }
}

You will still run into memory problems with your group and order functions because all records need to be in memory to be grouped and ordered.

Upvotes: 0

Related Questions