Lev M.
Lev M.

Reputation: 139

Efficient reading structured binary data from a file

I have the following code fragment that reads a binary file and validates it:

 FileStream f = File.OpenRead("File.bin");
 MemoryStream memStream = new MemoryStream();
 memStream.SetLength(f.Length);
 f.Read(memStream.GetBuffer(), 0, (int)f.Length);
 f.Seek(0, SeekOrigin.Begin);
 var r = new BinaryReader(f);
 Single prevVal=0;
 do
 {
    r.ReadUInt32();
    var val = r.ReadSingle();
    if (prevVal!=0) {
       var diff = Math.Abs(val - prevVal) / prevVal;
       if (diff > 0.25)
          Console.WriteLine("Bad!");
    }
    prevVal = val;
 }
 while (f.Position < f.Length);

It unfortunately works very slowly, and I am looking to improve this. In C++, I would simply read the file into a byte array and then recast that array as an array of structures:

struct S{
   int a;
   float b;
}

How would I do this in C#?

Upvotes: 5

Views: 1756

Answers (4)

Lev M.
Lev M.

Reputation: 139

Thank you everyone for very helpful comments and answers. Given this input, this is my preferred solution:

      [StructLayout(LayoutKind.Sequential, Pack = 1)]
      struct Data
      {
         public UInt32 dummy;
         public Single val;
      };
      static void Main(string[] args)
      {
         byte [] byteArray = File.ReadAllBytes("File.bin");
         ReadOnlySpan<Data> dataArray = MemoryMarshal.Cast<byte, Data>(new ReadOnlySpan<byte>(byteArray));
         Single prevVal=0;
         foreach( var v in dataArray) {
            if (prevVal!=0) {
               var diff = Math.Abs(v.val - prevVal) / prevVal;
               if (diff > 0.25)
                  Console.WriteLine("Bad!");
            }
            prevVal = v.val;
         }
      }
   }

It indeed works much faster than the original implementation.

Upvotes: 2

Matthew Watson
Matthew Watson

Reputation: 109832

This is what we use (compatible with older versions of C#):

public static T[] FastRead<T>(FileStream fs, int count) where T: struct
{
    int sizeOfT = Marshal.SizeOf(typeof(T));

    long bytesRemaining  = fs.Length - fs.Position;
    long wantedBytes     = count * sizeOfT;
    long bytesAvailable  = Math.Min(bytesRemaining, wantedBytes);
    long availableValues = bytesAvailable / sizeOfT;
    long bytesToRead     = (availableValues * sizeOfT);

    if ((bytesRemaining < wantedBytes) && ((bytesRemaining - bytesToRead) > 0))
    {
        Debug.WriteLine("Requested data exceeds available data and partial data remains in the file.");
    }

    T[] result = new T[availableValues];

    GCHandle gcHandle = GCHandle.Alloc(result, GCHandleType.Pinned);

    try
    {
        uint bytesRead;

        if (!ReadFile(fs.SafeFileHandle, gcHandle.AddrOfPinnedObject(), (uint)bytesToRead, out bytesRead, IntPtr.Zero))
        {
            throw new IOException("Unable to read file.", new Win32Exception(Marshal.GetLastWin32Error()));
        }

        Debug.Assert(bytesRead == bytesToRead);
    }

    finally
    {
        gcHandle.Free();
    }

    GC.KeepAlive(fs);

    return result;
}

[System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Interoperability", "CA1415:DeclarePInvokesCorrectly")]
[DllImport("kernel32.dll", SetLastError=true)]
[return: MarshalAs(UnmanagedType.Bool)]

private static extern bool ReadFile
(
    SafeFileHandle       hFile,
    IntPtr               lpBuffer,
    uint                 nNumberOfBytesToRead,
    out uint             lpNumberOfBytesRead,
    IntPtr               lpOverlapped
);

NOTE: This only works for structs that contain only blittable types, of course. And you must use [StructLayout(LayoutKind.Explicit)] and declare the packing to ensure that the struct layout is identical to the binary format of the data in the file.

For recent versions of C#, you can use Span as mentioned by Marc in the other answer!

Upvotes: 1

Marc Gravell
Marc Gravell

Reputation: 1064134

define a struct (possible a readonly struct) with explicit layout ([StructLayout(LayoutKind.Explicit)]) that is precisely the same as your C++ code, then one of:

  1. open the file as a memory-mapped file, get the pointer to the data; use either unsafe code on the raw pointer, or use Unsafe.AsRef<YourStruct> on the data, and Unsafe.Add<> to iterate
  2. open the file as a memory-mapped file, get the pointer to the data; create a custom memory over the pointer (of your T), and iterate over the span
  3. open the file as a byte[]; create a Span<byte> over the byte[], then use MemoryMarshal.Cast<,> to create a Span<YourType>, and iterate over that
  4. open the file as a byte[]; use fixed to pin the byte* and get a pointer; use unsafe code to walk the pointer
  5. something involve "pipelines" - a Pipe that is the buffer, maybe using StreamConnection on a FileStream for filling the pipe, and a worker loop that dequeues from the pipe; complication: the buffers can be discontiguous and may split at inconvenient places; solvable, but subtle code required whenever the first span isn't at least 8 bytes

(or some combination of those concepts)

Any of those should work much like your C++ version. The 4th is simple, but for very large data you probably want to prefer memory-mapped files

Upvotes: 4

NineBerry
NineBerry

Reputation: 28549

You are actually not using the MemoryStream at all currently. Your BinaryReader accesses the file directly. To have the BinaryReader use the MemoryStream instead:

Replace

f.Seek(0, SeekOrigin.Begin);
var r = new BinaryReader(f);

...

while (f.Position < f.Length);

with

memStream.Seek(0, SeekOrigin.Begin);
var r = new BinaryReader(memStream);

...

while(r.BaseStream.Position < r.BaseStream.Length)

Upvotes: 0

Related Questions