Nikita Vlasenko
Nikita Vlasenko

Reputation: 4352

Can't write hdf5 file: Error in H5Dcreate

When I am trying to write a dgcMatrix of 30 000x80 000 using rhdf5 in Rstudio:

h5writeDataset(as.matrix(dge_cut), file, 'rawcounts')

I am getting the error:

Error in H5Dcreate(loc$H5Identifier, dataset, tid, sid, dcpl = dcpl) : HDF5. Dataset. Unable to initialize object. In addition: Warning message: In h5createDataset(h5loc, name, dim, storage.mode = storage.mode(obj), : You created a large dataset with compression and chunking. The chunk size is equal to the dataset dimensions. If you want to read subsets of the dataset, you should test smaller chunk sizes to improve read times. Turn off this warning with showWarnings=FALSE. Error in H5Dopen(h5loc, name) : HDF5. Dataset. Object not found. Error in h5writeDatasetHelper(obj = obj, h5dataset = h5dataset, index = index, : object 'h5dataset' not found Error in h5writeDatasetHelper(obj = obj, h5dataset = h5dataset, index = index, : object 'h5dataset' not found In addition: Warning message: In is(h5id, "H5IdComponent") : restarting interrupted promise evaluation Error in H5Dclose(h5dataset) : object 'h5dataset' not found

File definitely exists and opened.

sessionInfo():

R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

I do not understand why this is happening. Any suggestions would be greatly appreciated.

Upvotes: 2

Views: 1576

Answers (1)

B. McV
B. McV

Reputation: 31

I ran into a similar issue when trying to save a vector with ~ 1.1 billion entries. The issue seemed to be related to the compression chunk being too large, the default chunk size is the dimensions of the dataset being saved. A fix that worked for me was to create to dataset first and set the chunk to something smaller. You could see if something like the following runs:

h5createDataset(file, 'rawcounts', c(30000, 80000), chunk = c(1000, 1000))
h5writeDataset(as.matrix(dge_cut), file, 'rawcounts')

It's probably not the cases that 1000 x 1000 is the best chunk size to choose but it is a place to start.

Upvotes: 3

Related Questions