oguzh4n
oguzh4n

Reputation: 682

Processing large text file in C#

I have 4GB+ text files (csv format) and I want to process this file using linq in c#.

I run complex linq query after load csv and convert to class?

but file size is 4gb although application memory double size of file.

how can i process (linq and new result) large files?

Thanks

Upvotes: 5

Views: 7202

Answers (3)

Alex Aza
Alex Aza

Reputation: 78477

Instead of loading whole file into memory, you could read and process the file line-by-line.

using (var streamReader = new StreamReader(fileName))
{
    string line;
    while ((line = streamReader.ReadLine()) != null)
    {
        // analize line here
        // throw it away if it does not match
    }
}

[EDIT]

If you need to run complex queries against the data in the file, the right thing to do is to load the data to database and let DBMS to take care of data retrieval and memory management.

Upvotes: 12

Rune FS
Rune FS

Reputation: 21752

If you are using .NET 4.0 you could use Clay and then write a method that returns an IEnumerable line for line and that makes code like the below possible

from record in GetRecords("myFile.csv",new []{"Foo","Bar"},new[]{","})
where record.Foo == "Baz"
select new {MyRealBar = int.Parse(record.Bar)

the method to project the CSV into a sequence of Clay objects could be created like:

 private IEnumerable<dynamic> GetRecords(
                    string filePath,
                    IEnumerable<string> columnNames, 
                    string[] delimiter){
            if (!File.Exists(filePath))
                yield break;
            var columns = columnNames.ToArray();
            dynamic New = new ClayFactory();
            using (var streamReader = new StreamReader(filePath)){
                var columnLength = columns.Length;
                string line;
                while ((line = streamReader.ReadLine()) != null){
                    var record = New.Record();
                    var fields = line.Split(delimiter, StringSplitOptions.None);
                    if(fields.Length != columnLength)
                        throw new InvalidOperationException(
                                 "fields count does not match column count");
                    for(int i = 0;i<columnLength;i++){
                        record[columns[i]] = fields[i];
                    }
                    yield return record;
                }
            }
        }

Upvotes: 1

Gans
Gans

Reputation: 1020

I think this one is good way... CSV

Upvotes: 1

Related Questions