Lee
Lee

Reputation: 149

How to read huge CSV file with 29 million rows of data using .net

I have a huge .csv file, to be specific a .TAB file with 29 million rows and the file size is around 600 MB. I would need to read this into an IEnumerable collection.

I have tried CsvHelper, GenericParser, and few other solutions but always ending up with an Out of Memory exception

Please suggest a way to do this

I have tried

var deliveryPoints = new List<Point>();

using (TextReader csvreader1 = File.OpenText(@"C:\testfile\Prod\PCDP1705.TAB")) //StreamReader csvreader1 = new StreamReader(@"C:\testfile\Prod\PCDP1705.TAB"))
using (var csvR1 = new CsvReader(csvreader1, csvconfig))
{
     csvR1.Configuration.RegisterClassMap<DeliveryMap>();
     deliveryPoints = csvR1.GetRecords<Point>().ToList();
}

using (GenericParser parser = new GenericParser())
{
     parser.SetDataSource(@"C:\testfile\Prod\PCDP1705.TAB");

     parser.ColumnDelimiter = '\t';
     parser.FirstRowHasHeader = false;
     //parser.SkipStartingDataRows = 10;
     //parser.MaxBufferSize = 4096;
     //parser.MaxRows = 500;
     parser.TextQualifier = '\"';

     while (parser.Read())
     {
         var address = new Point();
         address.PostCodeID = int.Parse(parser[0]);
         address.DPS = parser[1];
         address.OrganisationFlag = parser[2];
         deliveryPoints.Add(address);
     }
}

and

var deliveryPoints = new List<Point>();
csvreader = new StreamReader(@"C:\testfile\Prod\PCDP1705.TAB");
csv = new CsvReader(csvreader, csvconfig);

while (csv.Read())
{
     var address = new Point();
     address.PostCodeID = int.Parse(csv.GetField(0));
     address.DPS = csv.GetField(1);                
     deliveryPoints.Add(address);
}

Upvotes: 12

Views: 23905

Answers (3)

Liero
Liero

Reputation: 27380

The problem is that you are loading entire file into memory. You can compile your code to x64 which will increase memory limit for your program rapidly, but it is not recommended if you can avoid loading entire file into memory.

Notice that calling ToList() forces the CsvReader to load entire file into memory at once:

csvR1.GetRecords<Point>().ToList();

But this will load only one line at a time:

foreach(var record in csvR1.GetRecords<Point>())
{
    //do whatever with the single record
}

This way you can process files of unlimited size

Upvotes: 16

jdweng
jdweng

Reputation: 34433

No need to use 3rd party software. Use Net Library methods

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Data;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            StreamReader csvreader = new StreamReader(@"C:\testfile\Prod\PCDP1705.TAB");
            string inputLine = "";
            while ((inputLine = csvreader.ReadLine()) != null)
            {
                var address = new Point();
                string[] csvArray = inputLine.Split(new char[] { ',' });
                address.postCodeID = int.Parse(csvArray[0]);
                address.DPS = csvArray[1];
                Point.deliveryPoints.Add(address);
            }

            //add data to datatable
            DataTable dt = new DataTable();
            dt.Columns.Add("Post Code", typeof(int));
            dt.Columns.Add("DPS", typeof(string));

            foreach (Point point in Point.deliveryPoints)
            {
                dt.Rows.Add(new object[] { point.postCodeID, point.DPS });
            }

        }
    }
    public class Point
    {
        public static List<Point> deliveryPoints = new List<Point>();
        public int postCodeID { get; set; }
        public string DPS { get; set; }

    }
}

Upvotes: 12

Lee
Lee

Reputation: 149

It worked by running in x64 mode, and by adding <gcAllowVeryLargeObjects enabled="true" /> in app.config.

Upvotes: -1

Related Questions