Reputation: 26766
I'm attempting to read huge CSV files (50M+ rows, ~30 columns, multiple gigabyte files).
This will be run on business desktop-spec machines, so loading the file into memory isn't going to cut it. Streaming rows as they're parsed seems to be the sanest option.
To make things slightly more interesting, I only need 2 of the columns in the file, but the ordering of fields is not guaranteed and has to be derived from column headings.
As such, an iterator that returns array-per-row or similar would be excellent.
I can't just split on line breaks, as some of the field values may span multiple lines. I'd prefer to avoid manually checking which fields are quoted, unescaping as appropriate, etc...
Is there anything in the framework that will do this for me? If not, can someone give me some hints on how best to approach this?
Upvotes: 0
Views: 336
Reputation: 6326
You can try, Cinchoo ETL - an open source library to read and write CSV files
using (var reader = new ChoCSVReader("test.csv").WithFirstLineHeader()
.WithField("Field1")
.WithField("Field2")
)
{
foreach (dynamic item in reader)
{
Console.WriteLine(item.Field1);
Console.WriteLine(item.Field2);
}
}
Please check out articles at CodeProject on how to use it.
Hope it helps your needs.
Disclaimer: I'm the author of this library
Upvotes: 1