Reputation: 33
I am assigned a project that requires a c# console application that will be used to manipulate a text file. The text file is a bcp table dump. The program should be able to:
Currently, I am reading in the file as such:
var groupQuery = from name in File.ReadAllLines(fileName)
.Skip(skipHeaderRow)
let n = name.Split(delimiterChars)
group name by n[index] into g
// orderby g.Key
select g;
I am afraid I might run into memory issues since some of the files can have over 2 million reacords and each row is about 2617 bytes
Upvotes: 3
Views: 1462
Reputation: 679
Try using Buffered Streams to read/write files without completely loading them into memory.
using (FileStream fs = File.Open(inputFile, FileMode.Open, FileAccess.Read, FileShare.ReadWrite)) {
using (StreamReader sr = new StreamReader(fs)) {
string line = sr.ReadLine();
string lineA = null;
string lineB = null;
while ((line != null)) {
// Split your line here into lineA and lineB
// and write using buffered writer.
line = sr.ReadLine();
}
}
}
(from here)
The idea is to read the file line by line without loading the entire thing in your memory, split it however way you want and then write the splitted lines, line by line into your output files.
Upvotes: 1
Reputation: 14359
If you are confident that you program will only need to sequentially access... the bcp dump file, use StreamReader class to read the file. This class is optimized for sequential access and it opens the file as a stream, therefore memory issues should not bother you. Further, you can increase the buffer size of your stream by initializing from a different constructor of this class to have a larger chunk in memory to work with.
If you want to have random access to your file in pieces... go for Memory Mapped Files. make sure to create view accessor over a limited section of the file. The example code given at the link of MMFs explains how to create a small view over a large file.
Edit: I had the code for using MMFs in my answer but I have removed it now as I realized... Even though in reality group by is lazy, it is also a non-streaming LINQ operator. Therefore, it will have to read the entire bcp dump of yours to finally give you the results. This implies:
Upvotes: 2
Reputation: 14231
Don't reinvent the wheel. Consider using a library like FileHelpers.
http://www.filehelpers.net/example/QuickStart/ReadWriteRecordByRecord/
var engine = new FileHelperAsyncEngine<Customer>();
using(engine.BeginReadFile(fileName))
{
var groupQuery =
from o in engine
group name by o.CustomerId into g
// orderby g.Key
select g;
foreach(Customer cust in engine)
{
Console.WriteLine(cust.Name);
}
}
You will still run into memory problems with your group and order functions because all records need to be in memory to be grouped and ordered.
Upvotes: 0