Reputation: 4352
When I am trying to write a dgcMatrix
of 30 000x80 000
using rhdf5
in Rstudio
:
h5writeDataset(as.matrix(dge_cut), file, 'rawcounts')
I am getting the error:
Error in H5Dcreate(loc$H5Identifier, dataset, tid, sid, dcpl = dcpl) : HDF5. Dataset. Unable to initialize object. In addition: Warning message: In h5createDataset(h5loc, name, dim, storage.mode = storage.mode(obj), : You created a large dataset with compression and chunking. The chunk size is equal to the dataset dimensions. If you want to read subsets of the dataset, you should test smaller chunk sizes to improve read times. Turn off this warning with showWarnings=FALSE. Error in H5Dopen(h5loc, name) : HDF5. Dataset. Object not found. Error in h5writeDatasetHelper(obj = obj, h5dataset = h5dataset, index = index, : object 'h5dataset' not found Error in h5writeDatasetHelper(obj = obj, h5dataset = h5dataset, index = index, : object 'h5dataset' not found In addition: Warning message: In is(h5id, "H5IdComponent") : restarting interrupted promise evaluation Error in H5Dclose(h5dataset) : object 'h5dataset' not found
File definitely exists and opened.
sessionInfo()
:
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4
I do not understand why this is happening. Any suggestions would be greatly appreciated.
Upvotes: 2
Views: 1576
Reputation: 31
I ran into a similar issue when trying to save a vector with ~ 1.1 billion entries. The issue seemed to be related to the compression chunk being too large, the default chunk size is the dimensions of the dataset being saved. A fix that worked for me was to create to dataset first and set the chunk to something smaller. You could see if something like the following runs:
h5createDataset(file, 'rawcounts', c(30000, 80000), chunk = c(1000, 1000))
h5writeDataset(as.matrix(dge_cut), file, 'rawcounts')
It's probably not the cases that 1000 x 1000 is the best chunk size to choose but it is a place to start.
Upvotes: 3