Kris
Kris

Reputation: 23569

Load dataset from HDF5 file in C#

I'm trying to load a dataset from a HDF5 file in C# (.NET Framework) in such a way that I have the contents in an array, e.g. float[,]. I found the HDF.PInvoke library, but I find it very difficult to figure out how to use it.

Update

From Soonts answer, I managed to get it to work. Here's my working snippet:

using System;
using System.Runtime.InteropServices;
using HDF.PInvoke;

namespace MyNamespace
{
    class Program
    {
        static void Main()
        {
            string datasetPath = "/dense1/dense1/kernel:0";
            long fileId = H5F.open(@"\path\to\weights.h5", H5F.ACC_RDONLY);
            long dataSetId = H5D.open(fileId, datasetPath);
            long typeId = H5D.get_type(dataSetId);

            // read array (shape may be inferred w/ H5S.get_simple_extent_ndims)
            float[,] arr = new float[162, 128];
            GCHandle gch = GCHandle.Alloc(arr, GCHandleType.Pinned);
            try
            {
                H5D.read(dataSetId, typeId, H5S.ALL, H5S.ALL, H5P.DEFAULT,
                         gch.AddrOfPinnedObject());
            }
            finally
            {
                gch.Free();
            }

            // show one entry
            Console.WriteLine(arr[13, 87].ToString());

            // Keep the console window open in debug mode.
            Console.WriteLine("Press any key to exit.");
            Console.ReadKey();
        }
    }
}

Original first attempt:

What I've managed so far:

using System;
using System.IO;
using System.Runtime.InteropServices;
using HDF.PInvoke;

namespace MyNamespace
{
    class Program
    {
        static void Main()
        {
            string datasetPath = "/dense1/dense1/bias:0";
            long fileId = H5F.open(@"\path\to\weights.h5", H5F.ACC_RDONLY);
            long dataSetId = H5D.open(fileId, datasetPath);
            long typeId = H5D.get_type(dataSetId);
            long spaceId = H5D.get_space(dataSetId);

            // not sure about this
            TextWriter tw = Console.Out;
            GCHandle gch = GCHandle.Alloc(tw);

            // I was hoping that  this would write to the Console, but the
            // program crashes outside the scope of the c# debugger.
            H5D.read(
                dataSetId,
                typeId,
                H5S.ALL,
                H5S.ALL,
                H5P.DEFAULT,
                GCHandle.ToIntPtr(gch)
            );

            // Keep the console window open in debug mode.
            Console.WriteLine("Press any key to exit.");
            Console.ReadKey();
        }
    }
}

The signature for H5F.read() is:

Type    Name            Description
--------------------------------------------------------------
long    dset_id         Identifier of the dataset read from.
long    mem_type_id     Identifier of the memory datatype.
long    mem_space_id    Identifier of the memory dataspace.
long    file_space_id   Identifier of the dataset's dataspace in the file.
long    plist_id        Identifier of a transfer property list for this I/O operation.
IntPtr  buf             Buffer to receive data read from file.

Question

Could anyone help me fill in the blanks here?

Upvotes: 1

Views: 5640

Answers (2)

SOG
SOG

Reputation: 912

Alternatively, maybe you want to take a look at HDFql as it alleviates from HDF5 low-level details. Your (above posted) solution may be re-written/simplified using HDFql as follows:

using System;
using System.Runtime.InteropServices;
using AS.HDFql;   // use HDFql namespace (make sure it can be found by the C# compiler)

namespace MyNamespace
{
    class Program
    {
        static void Main()
        {
            // dims
            int h = 162;
            int w = 128;

            // read array
            float[] arrFlat = new float[h * w];

            HDFql.Execute("SELECT FROM \\path\\to\\weights.h5 \"/dense1/dense1/kernel:0\" INTO MEMORY " + HDFql.VariableTransientRegister(arrFlat));        

            // reshape
            float[,] arr = new float[h, w];  // row-major
            for (int i = 0; i < h; i++)
            {
                for (int j = 0; j < w; j++)
                {
                    arr[i, j] = arrFlat[i * w + j];
                }
            }

            // show one entry
            Console.WriteLine(arr[13, 87].ToString());
            Console.WriteLine(arrFlat[13 * w + 87].ToString());

            // Keep the console window open in debug mode.
            Console.WriteLine("Press any key to exit.");
            Console.ReadKey();
        }
    }
}

Additional examples on how to read datasets using HDFql can be found in the quick start guide and reference manual.

Upvotes: 0

Soonts
Soonts

Reputation: 21936

You need to create an array (normal 1D one, not the 2D) of the correct size and type. Then write something like this:

int width = 1920, height = 1080;
float[] data = new float[ width * height ];
var gch = GCHandle.Alloc( data, GCHandleType.Pinned );
try
{
    H5D.read( /* skipped */, gch.AddrOfPinnedObject() );
}
finally
{
    gch.Free();
}

This will read the dataset into the data array, you can then copy individual lines into another, 2D array if you need that.

Read API documentation how to get dimensions (HDF5 supports data set of arbitrary dimensions) and size of the dataset (for 2D dataset the size is 2 integers), i.e. how to find out the buffer size you need (for 2D dataset, it's width * height).

As for the elements type, you better know that in advance, e.g. float is fine.

Upvotes: 2

Related Questions