Dave_L
Dave_L

Reputation: 383

Is it possible to write a boolean to an hdf5 file in R that Python will recognise as an 8 bit enum

I want to be able to save an R TRUE/FALSE value in HDF5 such that when reading the file into Python and checking the data type is Boolean the test will pass. At the moment I can't do this. If I use:

library(rhdf5)
h5file = H5Fcreate("newfile.h5")
h5space = H5Screate_simple(1,NULL, native = TRUE)
h5dataset1 = H5Dcreate(h5file, "dataset1", "H5T_NATIVE_HBOOL", h5space)
H5Dwrite(h5dataset1, TRUE)
h5closeAll()

If I then inspect the variable using HDFView (3.1.3) I can see the saved object is stored as an 8-bit unsigned integer.

In order to pass a Python data type test along the lines of np.array(getattr(x,attr)).dtype == bool the type needs to register in HDFView as follows: 8-bit enum (0=FALSE, 1=TRUE).

How can I write an object of this type using either of the two R HDF5 packages rhdf5 or hdf5r?

Upvotes: 1

Views: 601

Answers (2)

Grimbough
Grimbough

Reputation: 136

Thanks for the question, sorry it's been a while until I got round to answer it.

This was impossible with the version of rhdf5 available at the time, and also required a slightly different approach. The H5T_NATIVE_HBOOL datatype is just a mapping to an unsigned 8-bit int (at least on Linux).

To create the enum datatype you're looking for, you have to create a custom datatype using H5Tenum_create(), and then set the mapping (e.g. TRUE = 1) using H5Tenum_insert().

Here's an example. You'll need rhdf5 version 2.43.1 or newer, which you can get from https://github.com/grimbough/rhdf5

library(rhdf5)

## our input data.  Note we're using 1 & 0
## but TRUE/FALSE would also work in this example
dat <- c(1, 1, 0, 1)

## create an HDF5 file
file <- tempfile(fileext = ".h5")
h5file = H5Fcreate(file)

## create the dataspace for our new data
h5space = H5Screate_simple(dims = dim(dat), NULL, native = TRUE)

## create the enum datatype with our mapping
## TRUE = 1 FALSE = 0
tid <- H5Tenum_create(dtype_id = "H5T_NATIVE_UCHAR")
H5Tenum_insert(tid, name = "TRUE", value = 1L)
H5Tenum_insert(tid, name = "FALSE", value = 0L)

## create the dataset with this new 
h5dataset1 = H5Dcreate(h5file, "dataset1", tid, h5space)

## write the data.  We have to use as.raw() because our
## base type is 8-bit and R integers are 32-bit
H5Dwrite(h5dataset1, as.raw(dat), h5type = tid)

## tidy up
h5closeAll()

We can use the h5ls command line tool to check our datatype is 8-bit enum and we have the (0=FALSE, 1=TRUE) mapping.

system2("h5ls", args = c("-v", file))
#> Opened "/tmp/Rtmp4zU9m5/file36f657af24dcb.h5" with sec2 driver.
#> dataset1                 Dataset {4/4}
#>     Location:  1:800
#>     Links:     1
#>     Storage:   4 logical bytes, 4 allocated bytes, 100.00% utilization
#>     Type:      enum native unsigned char {
#>                    TRUE             = 1
#>                    FALSE            = 0
#>                }

We can also read it back into R.

## we can read it back in and get a factor
h5read(file, name = "/dataset1")
#> [1] TRUE  TRUE  FALSE TRUE 
#> Levels: TRUE FALSE

I don't love this, because your aren't getting back exactly what you wrote.

Upvotes: 2

SOG
SOG

Reputation: 912

You may want to explore a third option/package which is HDFql. To create an 8 bit enum dataset named dataset1 containing two members (FALSE with value 0 and TRUE with value 1) it can be done as follows using HDFql in R:

source("HDFql.R")

hdfql_execute("CREATE FILE newfile.h5")

hdfql_execute("CREATE DATASET newfile.h5 dataset1 AS ENUMERATION(FALSE AS 0, TRUE AS 1)")

For additional information, please check HDFql reference manual and examples on how it works.

Upvotes: 1

Related Questions