Reputation: 1695
I am working on some cfd-simulations with c/CUDA and python, at the moment the workflow goes like this:
Since I have a lot of data and also some metadata I though it would be better to switch to hdf5 file format. So my Idea was something like,
I really would like to do some live analysis of the data i.e. write from the c-programm to hdf5 and directly read from python using pytables. This would be pretty useful, but I am really not sure how much this is supported by pytables.
Since I never worked with pytables or hdf5 it would be good to know if this is a good approach or if there are maybe some pitfalls.
Upvotes: 2
Views: 408
Reputation: 3190
I think it is a reasonable approach, but there is a pitfall indeed. The HDF5 C-library is not thread-safe (there is a "parallel" version, more on this later). That means, your scenario does not work out of the box: one process writing data to a file while another process is reading (not necessarily the same dataset) will result in a corrupted file. To make it work, you must either:
Recently, the HDF group published an MPI-based parallel version of HDF5, which makes concurrent read/write access possible. Cf. http://www.hdfgroup.org/HDF5/PHDF5/. It was created for use cases like yours.
To my knowledge, pytables does not provide any bindings to parallel HDF5. You should use h5py instead, which provides very user-friendly bindings to parallel HDF5. See the examples on this website: http://docs.h5py.org/en/2.3/mpi.html
Unfortunately, parallel HDF5 has a major drawback: to date, it does not support writing compressed datasets (reading is possible, though). Cf. http://www.hdfgroup.org/hdf5-quest.html#p5comp
Upvotes: 3