Reputation: 383
I want to be able to save an R TRUE/FALSE value in HDF5 such that when reading the file into Python and checking the data type is Boolean the test will pass. At the moment I can't do this. If I use:
library(rhdf5)
h5file = H5Fcreate("newfile.h5")
h5space = H5Screate_simple(1,NULL, native = TRUE)
h5dataset1 = H5Dcreate(h5file, "dataset1", "H5T_NATIVE_HBOOL", h5space)
H5Dwrite(h5dataset1, TRUE)
h5closeAll()
If I then inspect the variable using HDFView (3.1.3) I can see the saved object is stored as an 8-bit unsigned integer.
In order to pass a Python data type test along the lines of np.array(getattr(x,attr)).dtype == bool
the type needs to register in HDFView as follows: 8-bit enum (0=FALSE, 1=TRUE).
How can I write an object of this type using either of the two R HDF5 packages rhdf5
or hdf5r
?
Upvotes: 1
Views: 601
Reputation: 136
Thanks for the question, sorry it's been a while until I got round to answer it.
This was impossible with the version of rhdf5 available at the time, and also required a slightly different approach. The H5T_NATIVE_HBOOL
datatype is just a mapping to an unsigned 8-bit int (at least on Linux).
To create the enum datatype you're looking for, you have to create a custom datatype using H5Tenum_create()
, and then set the mapping (e.g. TRUE = 1) using H5Tenum_insert()
.
Here's an example. You'll need rhdf5 version 2.43.1 or newer, which you can get from https://github.com/grimbough/rhdf5
library(rhdf5)
## our input data. Note we're using 1 & 0
## but TRUE/FALSE would also work in this example
dat <- c(1, 1, 0, 1)
## create an HDF5 file
file <- tempfile(fileext = ".h5")
h5file = H5Fcreate(file)
## create the dataspace for our new data
h5space = H5Screate_simple(dims = dim(dat), NULL, native = TRUE)
## create the enum datatype with our mapping
## TRUE = 1 FALSE = 0
tid <- H5Tenum_create(dtype_id = "H5T_NATIVE_UCHAR")
H5Tenum_insert(tid, name = "TRUE", value = 1L)
H5Tenum_insert(tid, name = "FALSE", value = 0L)
## create the dataset with this new
h5dataset1 = H5Dcreate(h5file, "dataset1", tid, h5space)
## write the data. We have to use as.raw() because our
## base type is 8-bit and R integers are 32-bit
H5Dwrite(h5dataset1, as.raw(dat), h5type = tid)
## tidy up
h5closeAll()
We can use the h5ls
command line tool to check our datatype is 8-bit enum and we have the (0=FALSE, 1=TRUE) mapping.
system2("h5ls", args = c("-v", file))
#> Opened "/tmp/Rtmp4zU9m5/file36f657af24dcb.h5" with sec2 driver.
#> dataset1 Dataset {4/4}
#> Location: 1:800
#> Links: 1
#> Storage: 4 logical bytes, 4 allocated bytes, 100.00% utilization
#> Type: enum native unsigned char {
#> TRUE = 1
#> FALSE = 0
#> }
We can also read it back into R.
## we can read it back in and get a factor
h5read(file, name = "/dataset1")
#> [1] TRUE TRUE FALSE TRUE
#> Levels: TRUE FALSE
I don't love this, because your aren't getting back exactly what you wrote.
Upvotes: 2
Reputation: 912
You may want to explore a third option/package which is HDFql. To create an 8 bit enum dataset named dataset1
containing two members (FALSE
with value 0
and TRUE
with value 1
) it can be done as follows using HDFql in R:
source("HDFql.R")
hdfql_execute("CREATE FILE newfile.h5")
hdfql_execute("CREATE DATASET newfile.h5 dataset1 AS ENUMERATION(FALSE AS 0, TRUE AS 1)")
For additional information, please check HDFql reference manual and examples on how it works.
Upvotes: 1