nik
nik

Reputation: 1784

Working with Data from csv file

I have a large CSV file (50000 * 25) which is essentially a data table that has numeric and alphanumeric fields.

I have used "A Fast CSV Reader" from Lumenworks on code project. (link http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader?msg=4600509#xx4600509xx)

I m looking for the most efficient way to work with the data later, I will have to sum, average etc over parts of the data.

My code so far:

static void Main()
    {
        // open the file "data.csv" which is a CSV file with headers
        using (CsvReader csv =
               new CsvReader(new StreamReader(@"c:\Temp\Extrinsic_Hourly.csv"), true))
        {
            int fieldCount = csv.FieldCount;
            csv.SupportsMultiline = false;

            List<string> filedata = new List<string>();
            string[] headers = csv.GetFieldHeaders();



            while (csv.ReadNextRecord())
            {
                for (int i = 0; i < fieldCount; i++)        
                {
                //if (headers[i] == "PowerPrice")
                    filedata.Add(csv[i]);
                }
            }


            File.WriteAllLines(@"c:\Temp\test.txt", filedata); 


        }

The last line is just to check that import worked. This works fine and is relatively fast but now I have this huge list and it is difficult to work with. If I now need to average over column 13 I m stuck how to do this.

filedata.column(13).Average()

obviously doesnt work, not least because its all strings.

It would be nicer to import the data more structured into a class, or convert entire columns of the big 2-d list into 1d lists upon which I could also convert strings into doubles if they are numeric.

Any ideas what the best way is if I have to later perform arithmitic operations on entire columns or part of a column based on criteria from a different column, for instance column 1 has the date and I might want to sum column 2 by months.

Thanks.

Upvotes: 0

Views: 864

Answers (1)

Tim Schmelter
Tim Schmelter

Reputation: 460108

The most straight forward approach would be to use a DataTable. For example:

DataTable tblCSV = new DataTable("CSV");
var fileInfo = new System.IO.FileInfo(fullPath);
var encoding = Encoding.GetEncoding(437); // use the correct encoding
using (var reader = new System.IO.StreamReader(fullPath, encoding))
{
    //reader.ReadLine(); // skip all lines but header+data
    Char quotingCharacter = '\0';//'"';
    Char escapeCharacter = quotingCharacter;
    using (var csv = new CsvReader(reader, true, Importer.FieldDelimiter, quotingCharacter, escapeCharacter, '\0', ValueTrimmingOptions.All))
    {
        csv.MissingFieldAction = MissingFieldAction.ParseError;
        csv.DefaultParseErrorAction = ParseErrorAction.RaiseEvent;
        csv.ParseError += csv_ParseError;
        csv.SkipEmptyLines = true;
        try
        {
            // load into DataTable
            tblCSV.Load(csv, LoadOption.OverwriteChanges, csvTable_FillError);

Then you can use Linq-To-DataSet:

double avg = tblCSV.AsEnumerable()
    .Select(r => int.Parse(r.Field<string>(13)))
    .Average();

Upvotes: 2

Related Questions