Freek Nossin
Freek Nossin

Reputation: 101

Create hdf5 file from scratch using file image operations (memory mapped hdf5 files)

Problem: I want to use memory mapped HDF5 files for our unit tests. Is it possible to create them from scratch?

Status: I've read up on the HDF5 file image operations document, and tried to apply it. Depending on the exact parameters used, I get an invalid file identifier (-1), or subsequent creation of datasets fail.

Typically our unit tests write new test files mimicking users saving newly created data to a file on disk. So there is not yet an existing file. When reading up on the documentation of hdf5 file image operations, it is assumed that an initial file image is set. I don't have any - as I'm trying to stay as close as possible to the actual user scenario with my tests. Can such a file be created from an empty buffer?

static const unsigned int FileSize = 1024 * 1024 * 100;
std::vector<unsigned char> buffer(FileSize, 0);     // initialize buffer with zeroes
int flags = H5LT_FILE_IMAGE_DONT_COPY | 
            H5LT_FILE_IMAGE_OPEN_RW | 
            H5LT_FILE_IMAGE_DONT_RELEASE;
m_file = H5LTopen_file_image(static_cast<void*>(buffer.data()), buffer.size(), flags);

If want to keep ownership of the buffer as in the example I don't get a valid file id. I suspected a bug in HDF5, but unfortunately leaving the flags H5LT_FILE_IMAGE_DONT_COPY | H5LT_FILE_IMAGE_DONT_RELEASE out didn't work either.

Upvotes: 3

Views: 833

Answers (2)

eudoxos
eudoxos

Reputation: 19075

Bulding upon @FreekNossin's answer, this is a more complete code, using the c++ API where available:

#include<H5Cpp.h>

/* create the HDF5 file image first */
H5::FileAccPropList accPList=H5::FileAccPropList::DEFAULT;
// https://confluence.hdfgroup.org/display/HDF5/H5P_SET_FAPL_CORE
herr_t h5err=H5Pset_fapl_core(accPList.getId(),/* memory increment size: 4M */1<<22,/*backing_store*/false);
if(h5err<0) throw std::runtime_error("H5P_set_fapl_core failed.");
H5::H5File h5file("whatever",H5F_ACC_TRUNC,H5::FileCreatPropList::DEFAULT,accPList);

/* add data like usual */
H5::Group grp=h5file.createGroup("somegroup");
/* ... */

/* get the image */
h5file.flush(H5F_SCOPE_LOCAL); // probably not necessary
ssize_t imgSize=H5Fget_file_image(h5file.getId(),NULL,0); // first call to determine size
std::vector<char> buf(imgSize);
H5Fget_file_image(h5file.getId(),buf.data(),imgSize); // second call to actually copy the data into our buffer

EDIT: There is a pitfall in the code: if two threads open the same "whatever" (presudo)file, H5::FileIException: unable to truncate a file which is already open is thrown. A workaround I use is to generate unique name every time like this:

static std::atomic<int> _var=0;
std::string hdf5name(("whatever+std::to_string(_var++)).c_str());

Upvotes: 3

Freek Nossin
Freek Nossin

Reputation: 101

Apparently the H5LTOpen_file_image wraps some calls that also allow for virtual file creation. This is all management by the core file driver. The desired result can be retrieved by passing some parameters to the core file driver.

auto propertyList = H5Pcreate(H5P_FILE_ACCESS);
auto h5Result = H5Pset_fapl_core(propertyList, m_buffer.GetSize(), false);
assert(h5Result >= 0 && "H5Pset_fapl_core failed");
m_file = H5Fcreate(name, flags, H5P_DEFAULT, propertyList);

The last parameter of the call to H5Pset_fapl_core sets the boolean value for "virtual backing store". If set to false the file contents are not written to disk.

Note that in the end I had to use all the advanced tricks in the document referred in the opening post to really get all the functionality properly working. The document is a good reference but is slightly outdated (enums have different but similar naming in the latest release).

Upvotes: 3

Related Questions