tk.
tk.

Reputation: 1216

how to improve the performance of reading bytes line by line from a filestream

I have a file bigger than 10G. To read this file line by line, I wrote this function.

static IEnumerable<string> fread(string fname, Encoding enc) 
  using (var f = File.OpenRead(fname))
  using (var reader = new StreamReader(f, enc))
    while (!reader.EndOfStream)
      yield return reader.ReadLine();     
}

This code works pretty well, but it returns a line string, not a line byte[]. So to return byte[] for each line, I wrote another function.

static IEnumerable<byte[]> freadbytes(string fname) {
  using (var f = File.OpenRead(fname)) {
    var bufSz = 1024;
    var buf = new byte[bufSz];
    var read = 1;
    var cr = (byte)13; // \r
    var lf = (byte)10; // \n
    var data = new List<byte>();
    while (read > 0) {
      read = f.Read(buf, 0, bufSz);
      data.AddRange(read == bufSz ? buf : buf.slc(0, read));
      var i = data.IndexOf(lf);
      while (i >= 0) {
        if (i > 0 && data[i - 1] == cr) yield return data.Take(i - 1).ToArray();
        else yield return data.Take(i).ToArray();
        data.RemoveRange(0, i + 1);
        i = data.IndexOf(lf);
      }
    }
  }
}

The second function, freadbytes(), also works well, but the problem is that the second function takes more than 10 times of the first function. To make the second function faster, what can I do?

Upvotes: 2

Views: 2051

Answers (2)

Mrchief
Mrchief

Reputation: 76258

Maybe this will help:

static IEnumerable<string> fread(string fname, Encoding enc) 
  using (var f = File.OpenRead(fname))
  using (var reader = new StreamReader(f, enc))
    while (!reader.EndOfStream)
      yield return enc.GetBytes(reader.ReadLine());     
}

Update: Had missed the enc param initially.

Upvotes: 0

spender
spender

Reputation: 120518

Although untested, I'm sure this will be considerably faster:

static IEnumerable<byte[]> fread(string fname, Encoding enc) 
{
  using (var f = File.OpenRead(fname))
  using (var reader = new StreamReader(f, enc))
    while (!reader.EndOfStream)
      yield return enc.GetBytes(reader.ReadLine());     
}

Upvotes: 5

Related Questions