Reputation: 6713
I have a large data-frame (126041 Obs. of 604 variables). I'm new to HDF5 formats. I save the HDF5 file as follows:
writeH5DataFrame(myData,"C:/myDir/myHDF5.h5",overwrite=T)
how can I read the data frame back? there doesn't appear to be any readH5DataFrame
or loadH5DataFrame
function?
also, the writeH5DataFrame
takes an incredibly long time, probably because of the large number of columns (604 in this case). The documentation mentions that "the data for each column is stored in a separate H5Dataset." - not sure if this the reason for the long time taken. Is there any way to speed up writing a DataFrame in HDF5 format?
Upvotes: 4
Views: 5652
Reputation: 121568
I don't know which package are you using, but using rhdf5
package, it looks very easy to write/read hdf5 files.
## uncomment the 2 lines after to install the package
## source("http://bioconductor.org/biocLite.R")
## biocLite("rhdf5")
library(rhdf5)
## empty HDF5 file : the data base
h5createFile("myhdf5file.h5")
## create group hierarchy. : tables or datasets
h5createGroup("myhdf5file.h5","group1")
h5createGroup("myhdf5file.h5","group2")
## save a matrix
A = matrix(1:10,nr=5,nc=2)
h5write(A, "myhdf5file.h5","group1/A")
## save an array with attribute
B = array(seq(0.1,2.0,by=0.1),dim=c(5,2,2))
attr(B, "scale") <- "liter"
h5write(B, "myhdf5file.h5","group2/B")
## check the data base
h5ls("myhdf5file.h5")
group name otype dclass dim
0 / group1 H5I_GROUP
1 /group1 A H5I_DATASET INTEGER 5 x 2
2 / group2 H5I_GROUP
3 /group2 B H5I_DATASET FLOAT 5 x 2 x 2
## read A and B
D = h5read("myhdf5file.h5","group1/A")
E = h5read("myhdf5file.h5","group2/B")
Upvotes: 3