Reputation: 135
I have a text file, that could potentially have up to 1 Million lines in it, and I have a code for reading the file one line at a time, but this is taking a lot of time...lots and lots of time. Is there a method in c# to potentially optimize this process, and improve the reading. This is the code I'm using.
using(var file = new StreamReader(filePath))
{
while((line = file.ReadLine()) != null)
{
//do something.
}
}
Any suggestions on reading these lines in bulk or improving the process?
Thanks.
Thanks for all your comments. The issue had to do with the \do something where I was using the SmartXls library to write to Excel, which was causing the bottle neck. I have contacted the developers to address the issue. All the suggested solutions will work in other scenarios.
Upvotes: 3
Views: 2540
Reputation: 32787
If space is not an issue..Create a buffer of around 1mb..
using(BufferedStream bs=new BufferedStream(File.OpenRead(path),1024*1024))
{
int read=-1;
byte[] buffer=new byte[1024*1024];
while((read=bs.Read(buffer,0,buffer.Length))!=0)
{
//play with buffer
}
}
Upvotes: 0
Reputation: 3526
To improve performance, consider performing whatever work you are currently doing in your loop by spawning another thread to handle the load.
Parallel.ForEach(file.ReadLines(), (line) =>
{
// do your business
});
Upvotes: 0
Reputation: 12524
You can read more data at once using StreamReader's int ReadBlock(char[] buffer, int index, int count)
rather than line by line. This avoids reading reading the entire file at once (File.ReadAllLines
) but allows you to process larger chunks in RAM at a time.
Upvotes: 0
Reputation: 4919
Try to use streamreader, see if it's faster
string filePath = "";
string fileData = "";
using (FileStream fs = new FileStream(filePath, FileMode.Open))
{
byte[] data = new byte[fs.Length];
fs.Seek(0, SeekOrigin.Begin);
fs.Read(data, 0, int.Parse(fs.Length.ToString()));
fileData = System.Text.Encoding.Unicode.GetString(data);
}
Upvotes: 0
Reputation: 1499770
Well, this code would be simpler, if you're using .NET 4 or later you can use File.ReadLines
:
foreach (var line in File.ReadLines())
{
// Do something
}
Note that this is not the same as ReadAllLines
, as ReadLines
returns an IEnumerable<string>
which reads lines lazily, instead of reading the whole file in one go.
The effect at execution time will be broadly the same as your original code (it won't improve performance) - this is just simpler to read.
Fundamentally, if you're reading a large file, that can take a long time - but reading just a million lines shouldn't take "lots and lots of time". My guess is that whatever you're doing with the lines takes a long time. You might want to parallelize that, potentially using a producer/consumer queue (e.g. via BlockingCollection
) or TPL Dataflow, or just use Parallel LINQ, Parallel.ForEach
etc.
You should use a profiler to work out where the time is being spent. If you're reading from a very slow file system, then it's possible that it really is the reading which is taking the time. We don't have enough information to guide you on that, but you should be able to narrow it down yourself.
Upvotes: 6
Reputation: 1
You can also use ReadAllLines(filepath)
and load the file into an array of lines, like this:
string[] lines = System.IO.File.ReadAllLines(@"path");
Upvotes: -2