Reputation: 25
I have data stored in hdf5 format, the shape of the data is: (10000, 100), 10000 vectors of 100 floats.
I want to extract the data from the file into c++ vectors, so for this data I would have 10000 vectors where each element is a vector of 100 floats.
I am trying to create a memspace with 1 dimension of 100 elements, then I am trying to read from the file dataset a single row into the memory, but I always get an error:
#001: ../../../src/H5Dio.c line 487 in H5D__read(): src and dest dataspaces have different number of elements selected
Here is my code:
H5File fp(... , H5F_ACC_RDONLY);
DataSet dset = fp.openDataSet("/dataset");
DataSpace dspace = dset.getSpace();
hsize_t rank;
hsize_t dims[2];
rank = dspace.getSimpleExtentDims(dims, NULL);
cout<<"Datasize: " << dims[0] << endl;
// Define the memory dataspace
hsize_t dimsm[1];
dimsm[0] = dims[1];
DataSpace memspace (1, dimsm);
// create a vector the same size as the dataset
vector<vector<float>> data;
data.resize(dims[0]);
for (hsize_t i = 0; i < dims[0]; i++) {
data[i].resize(dims[1]);
}
//cout<<"Vectsize: "<< data.size() <<endl;
// Initialize hyperslabs
hsize_t dataCount[1] = {0,};
hsize_t dataOffset[1] = {0,};
hsize_t memCount[1] = {0,};
hsize_t memOffset[1] = {0,};
for (hsize_t i = 0; i < dims[0]; i++) {
dataOffset[0] = i;
dataCount[0] = dims[1];
memOffset[0] = 0;
memCount[0] = dims[1];
dspace.selectHyperslab(H5S_SELECT_SET, dataCount, dataOffset);
memspace.selectHyperslab(H5S_SELECT_SET, memCount, memOffset);
dset.read(data[i].data(), PredType::IEEE_F32LE, memspace, dspace);
printf("OK %d\n", (int)i);
}
Upvotes: 1
Views: 1219
Reputation: 912
A (simpler) way to solve this is to use HDFql as it alleviates from HDF5 low-level details, in particular hyperslab/point selections which can be rather difficult to set. Your issue could be solved as follows using HDFql in C++:
// declare variables
std::stringstream script;
vector <vector<float>> data;
// use (i.e. open) an HDF5 file named 'my_file.h5'
HDFql::execute("USE FILE my_file.h5");
// set size of first dimension of variable 'data' to 10000
data.resize(10000);
// loop over the first dimension of dataset 'my_dataset'
for(int i = 0; i < 10000; i++)
{
// set size of second dimension of variable 'data' to 100
data[i].resize(100);
// prepare script to read first dimension of dataset 'my_dataset' using a point selection and populate variable 'data' with it
script << "SELECT FROM my_dataset[" << i << "] INTO MEMORY " << HDFql::variableTransientRegister(data[i]);
// execute script
HDFql::execute(script);
// clear variable 'script'
script.str("");
}
Upvotes: 1
Reputation: 13310
The dataset dataspace is 2D but you manipulate it with a 1D datacount and offset. Therefore the selectHyperslap method reads garbage beyond the end of the input arrays. Try it like this:
hsize_t dataCount[2] = {1, dims[1]};
hsize_t dataOffset[2] = {0, 0};
const hsize_t memCount[1] = {dims[1]};
const hsize_t memOffset[1] = {0};
memspace.selectHyperslab(H5S_SELECT_SET, memCount, memOffset);
for (hsize_t i = 0; i < dims[0]; i++) {
dataOffset[0] = i;
dspace.selectHyperslab(H5S_SELECT_SET, dataCount, dataOffset);
dset.read(data[i].data(), PredType::NATIVE_FLOAT, memspace, dspace);
}
Some parts are const and don't need to be changed. I'm not even sure you need to select a hyperslab on the memory dataspace. Also, I've changed the output datatype to the native float. You should read in the format of the platform, even if you define datasets as IEEE_F32LE for consistency. HDF5 will handle the conversion.
Upvotes: 1