Reputation: 21
Image of my dataset:
I am using the HDF5DotNet with C# and I can read only full data as the attached image in the dataset. The hdf5 file is too big, up to nearly 10GB, and if I load the whole array into the memory then it will be out of memory.
I would like to read all data from rows 5 and 7 in the attached image. Is that anyway to read only these 2 rows data into memory in a time without having to load all data into memory first?
private static void OpenH5File()
{
var h5FileId = H5F.open(@"D:\Sandbox\Flood Modeller\Document\xmdf_results\FMA_T1_10ft_001.xmdf", H5F.OpenMode.ACC_RDONLY);
string dataSetName = "/FMA_T1_10ft_001/Temporal/Depth/Values";
var dataset = H5D.open(h5FileId, dataSetName);
var space = H5D.getSpace(dataset);
var dataType = H5D.getType(dataset);
long[] offset = new long[2];
long[] count = new long[2];
long[] stride = new long[2];
long[] block = new long[2];
offset[0] = 1; // start at row 5
offset[1] = 2; // start at column 0
count[0] = 2; // read 2 rows
count[0] = 165701; // read all columns
stride[0] = 0; // don't skip anything
stride[1] = 0;
block[0] = 1; // blocks are single elements
block[1] = 1;
// Dataspace associated with the dataset in the file
// Select a hyperslab from the file dataspace
H5S.selectHyperslab(space, H5S.SelectOperator.SET, offset, count, block);
// Dimensions of the file dataspace
var dims = H5S.getSimpleExtentDims(space);
// We also need a memory dataspace which is the same size as the file dataspace
var memspace = H5S.create_simple(2, dims);
double[,] dataArray = new double[1, dims[1]]; // just get one array
var wrapArray = new H5Array<double>(dataArray);
// Now we can read the hyperslab
H5D.read(dataset, dataType, memspace, space,
new H5PropertyListId(H5P.Template.DEFAULT), wrapArray);
}
Upvotes: 2
Views: 1107
Reputation: 912
Maybe you want to have a look at HDFql as it abstracts yourself from the low-level details of HDF5. Using HDFql in C#, you can read rows #5 and #7 of dataset Values
using a hyperslab selection like this:
float [,]data = new float[2, 165702];
HDFql.Execute("SELECT FROM Values[5:2:2:1] INTO MEMORY " + HDFql.VariableTransientRegister(data));
Afterwards, you can access these rows through variable data
. Example:
for(int x = 0; x < 2; x++)
{
for(int y = 0; y < 165702; y++)
{
System.Console.WriteLine(data[x, y]);
}
}
Upvotes: 0
Reputation: 5471
You need to select a hyperslab which has the correct offset, count, stride, and block for the subset of the dataset that you wish to read. These are all arrays which have the same number of dimensions as your dataset.
The block is the size of the element block in each dimension to read, i.e. 1 is a single element.
The offset is the number of blocks from the start of the dataset to start reading, and count is the number of blocks to read.
You can select non-contiguous regions by using stride, which again counts in blocks.
I'm afraid I don't know C#, so the following is in C. In your example, you would have:
hsize_t offset[2], count[2], stride[2], block[2];
offset[0] = 5; // start at row 5
offset[1] = 0; // start at column 0
count[0] = 2; // read 2 rows
count[1] = 165702; // read all columns
stride[0] = 1; // don't skip anything
stride[1] = 1;
block[0] = 1; // blocks are single elements
block[1] = 1;
// This assumes you already have an open dataspace with ID dataspace_id
H5Sselect_hyperslab(dataspace_id, H5S_SELECT_SET, offset, stride, count, block)
You can find more information on reading/writing hyperslabs in the HDF5 tutorial.
It seems there are two forms of H5D.read
in C#, you want the second form:
H5D.read(Type) Method (H5DataSetId, H5DataTypeId, H5DataSpaceId,
H5DataSpaceId, H5PropertyListId, H5Array(Type))
This allows you specify the memory and file dataspaces. Essentially, you need one dataspace which has information about the size, stride, offset, etc. of the variable in memory that you want to read into; and one dataspace for the dataset in the file that you want to read from. This lets you do things like read from a non-contiguous region in a file to a contiguous region in an array in memory.
You want something like
// Dataspace associated with the dataset in the file
var dataspace = H5D.get_space(dataset);
// Select a hyperslab from the file dataspace
H5S.selectHyperslab(dataspace, H5S.SelectOperator.SET, offset, count);
// Dimensions of the file dataspace
var dims = H5S.getSimpleExtentDims(dataspace);
// We also need a memory dataspace which is the same size as the file dataspace
var memspace = H5S.create_simple(rank, dims);
// Now we can read the hyperslab
H5D.read(dataset, datatype, memspace, dataspace,
new H5PropertyListId(H5P.Template.DEFAULT), wrapArray);
From your posted code, I think I've spotted the problem. First you do this:
var space = H5D.getSpace(dataset);
then you do
var dataspace = H5D.getSpace(dataset);
These two calls do the same thing, but create two different variables
You call H5S.selectHyperslab
with space
, but H5D.read
uses dataspace
.
You need to make sure you are using the correct variables consistently. If you remove the second call to H5D.getSpace
, and change dataspace -> space
, it should work.
Upvotes: 1