Reputation: 1216
I have a file bigger than 10G. To read this file line by line, I wrote this function.
static IEnumerable<string> fread(string fname, Encoding enc)
using (var f = File.OpenRead(fname))
using (var reader = new StreamReader(f, enc))
while (!reader.EndOfStream)
yield return reader.ReadLine();
}
This code works pretty well, but it returns a line string, not a line byte[]. So to return byte[] for each line, I wrote another function.
static IEnumerable<byte[]> freadbytes(string fname) {
using (var f = File.OpenRead(fname)) {
var bufSz = 1024;
var buf = new byte[bufSz];
var read = 1;
var cr = (byte)13; // \r
var lf = (byte)10; // \n
var data = new List<byte>();
while (read > 0) {
read = f.Read(buf, 0, bufSz);
data.AddRange(read == bufSz ? buf : buf.slc(0, read));
var i = data.IndexOf(lf);
while (i >= 0) {
if (i > 0 && data[i - 1] == cr) yield return data.Take(i - 1).ToArray();
else yield return data.Take(i).ToArray();
data.RemoveRange(0, i + 1);
i = data.IndexOf(lf);
}
}
}
}
The second function, freadbytes(), also works well, but the problem is that the second function takes more than 10 times of the first function. To make the second function faster, what can I do?
Upvotes: 2
Views: 2051
Reputation: 76258
Maybe this will help:
static IEnumerable<string> fread(string fname, Encoding enc)
using (var f = File.OpenRead(fname))
using (var reader = new StreamReader(f, enc))
while (!reader.EndOfStream)
yield return enc.GetBytes(reader.ReadLine());
}
Update: Had missed the enc
param initially.
Upvotes: 0
Reputation: 120518
Although untested, I'm sure this will be considerably faster:
static IEnumerable<byte[]> fread(string fname, Encoding enc)
{
using (var f = File.OpenRead(fname))
using (var reader = new StreamReader(f, enc))
while (!reader.EndOfStream)
yield return enc.GetBytes(reader.ReadLine());
}
Upvotes: 5