John Pramanti
John Pramanti

Reputation: 135

Read text file with lots of line in C#

I have a text file, that could potentially have up to 1 Million lines in it, and I have a code for reading the file one line at a time, but this is taking a lot of time...lots and lots of time. Is there a method in c# to potentially optimize this process, and improve the reading. This is the code I'm using.

using(var file = new StreamReader(filePath))
{
    while((line = file.ReadLine()) != null)
     {
         //do something.
     }
}

Any suggestions on reading these lines in bulk or improving the process?

Thanks.

Thanks for all your comments. The issue had to do with the \do something where I was using the SmartXls library to write to Excel, which was causing the bottle neck. I have contacted the developers to address the issue. All the suggested solutions will work in other scenarios.

Upvotes: 3

Views: 2540

Answers (6)

Anirudha
Anirudha

Reputation: 32787

If space is not an issue..Create a buffer of around 1mb..

using(BufferedStream bs=new BufferedStream(File.OpenRead(path),1024*1024))
{
     int read=-1;
     byte[] buffer=new byte[1024*1024];
     while((read=bs.Read(buffer,0,buffer.Length))!=0)
     {
            //play with buffer
     }
}

Upvotes: 0

Rob G
Rob G

Reputation: 3526

To improve performance, consider performing whatever work you are currently doing in your loop by spawning another thread to handle the load.

Parallel.ForEach(file.ReadLines(), (line) =>
{
   // do your business
});

Upvotes: 0

jltrem
jltrem

Reputation: 12524

You can read more data at once using StreamReader's int ReadBlock(char[] buffer, int index, int count) rather than line by line. This avoids reading reading the entire file at once (File.ReadAllLines) but allows you to process larger chunks in RAM at a time.

Upvotes: 0

User2012384
User2012384

Reputation: 4919

Try to use streamreader, see if it's faster

string filePath = "";
string fileData = "";
using (FileStream fs = new FileStream(filePath, FileMode.Open))
{
                byte[] data = new byte[fs.Length];
                fs.Seek(0, SeekOrigin.Begin);
                fs.Read(data, 0, int.Parse(fs.Length.ToString()));
                fileData = System.Text.Encoding.Unicode.GetString(data);
}

Upvotes: 0

Jon Skeet
Jon Skeet

Reputation: 1499770

Well, this code would be simpler, if you're using .NET 4 or later you can use File.ReadLines:

foreach (var line in File.ReadLines())
{
    // Do something
}

Note that this is not the same as ReadAllLines, as ReadLines returns an IEnumerable<string> which reads lines lazily, instead of reading the whole file in one go.

The effect at execution time will be broadly the same as your original code (it won't improve performance) - this is just simpler to read.

Fundamentally, if you're reading a large file, that can take a long time - but reading just a million lines shouldn't take "lots and lots of time". My guess is that whatever you're doing with the lines takes a long time. You might want to parallelize that, potentially using a producer/consumer queue (e.g. via BlockingCollection) or TPL Dataflow, or just use Parallel LINQ, Parallel.ForEach etc.

You should use a profiler to work out where the time is being spent. If you're reading from a very slow file system, then it's possible that it really is the reading which is taking the time. We don't have enough information to guide you on that, but you should be able to narrow it down yourself.

Upvotes: 6

john doe
john doe

Reputation: 1

You can also use ReadAllLines(filepath) and load the file into an array of lines, like this: string[] lines = System.IO.File.ReadAllLines(@"path");

Upvotes: -2

Related Questions