uday
uday

Reputation: 6713

h5read crashes with large strings

I have a hdf5 file written using the rhdf5 package. The output of h5ls(myHDF5,all=TRUE) is as follows:

  group      name       otype  dclass       dim
0     /     char5 H5I_DATASET  STRING   1867124
1     /     char6 H5I_DATASET  STRING   1867124
2     /     char7 H5I_DATASET  STRING   1867124
3     /      dims H5I_DATASET INTEGER         2
4     /   headers H5I_DATASET  STRING       212
5     /       int H5I_DATASET INTEGER 233390500
6     /  intorder H5I_DATASET INTEGER       125
7     /      real H5I_DATASET   FLOAT 156838416
8     / realorder H5I_DATASET INTEGER        84

If I read the headers object, which is a string vector, in the myHDF5 file as follows: headers<-h5read(myHDF5,"headers"), it works fine.

But if I try to read a larger string vector as follows: char5<-h5read(myHDF5,"char5") then R crashes (R Studio reloads).

The larger string array char5 had been previously stored as follows:

nr<-length(char5)
mxsize<-max(nchar(char5))  
h5createDataset(myHDF5,"char5",storage.mode="character",level=9,dims=nr,chunk=10000,size=mxsize)
h5write(char5,myHDF5,"char5) 

while the smaller string array headers had been previously stored as follows:

nc<-length(headers)
mxsize<-max(nchar(headers))  
h5createDataset(myHDF5,"headers",storage.mode="character",level=9,dims=nc,chunk=nc,size=mxsize)
h5write(headers,myHDF5,"headers")

The main difference is the chunk size value used. I changed the chunk size for the larger string vector to be same the dims, i.e. chunk=nr, and R still crashes.

Why could be the reason for R to crash?

Note: R doesn't crash if I read the integer or float data from the myHDF5 file.

Upvotes: 0

Views: 442

Answers (1)

Michael Menden
Michael Menden

Reputation: 11

I had the same problem. A simple solution, although being not perfect is using the package "h5r":

library(h5r)

f <- H5File(h5FilePath)
g <- getH5Group(f, "/")
d <- getH5Dataset(g, "stringArray")[]

Upvotes: 1

Related Questions