Reputation: 83147
I'm getting this warning message when I try to load data frame saved in pandas as an HDF5 file in R:
Warning message: In H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem, : NAs produced by integer overflow while converting 64-bit integer or unsigned 32-bit integer from HDF5 to a 32-bit integer in R. Choose bit64conversion='bit64' or bit64conversion='double' to avoid data loss and see the vignette 'rhdf5' for more details about 64-bit integers.
For example, if I create HDF5 file in pandas with:
import pandas as pd
frame = pd.DataFrame({
'time':[1234567001,1234515616515167005],
'X2':[23.88,23.96]
},columns=['time','X2'])
store = pd.HDFStore('a.hdf5')
store['df'] = frame
store.close()
print(frame)
which returns:
time X2
0 1234567001 23.88
1 1234515616515167005 23.96
and try to load it in R:
#source("http://bioconductor.org/biocLite.R")
#biocLite("rhdf5")
library(rhdf5)
loadhdf5data <- function(h5File) {
# Function taken from [How can I load a data frame saved in pandas as an HDF5 file in R?](https://stackoverflow.com/a/45024089/395857)
listing <- h5ls(h5File)
# Find all data nodes, values are stored in *_values and corresponding column
# titles in *_items
data_nodes <- grep("_values", listing$name)
name_nodes <- grep("_items", listing$name)
data_paths = paste(listing$group[data_nodes], listing$name[data_nodes], sep = "/")
name_paths = paste(listing$group[name_nodes], listing$name[name_nodes], sep = "/")
columns = list()
for (idx in seq(data_paths)) {
print(idx)
data <- data.frame(t(h5read(h5File, data_paths[idx])))
names <- t(h5read(h5File, name_paths[idx], bit64conversion='bit64'))
#names <- t(h5read(h5File, name_paths[idx], bit64conversion='double'))
entry <- data.frame(data)
colnames(entry) <- names
columns <- append(columns, entry)
}
data <- data.frame(columns)
return(data)
}
frame = loadhdf5data("a.hdf5")
I get this warning message:
> frame = loadhdf5data("a.hdf5")
[1] 1
[1] 2
Warning message:
In H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem, :
NAs produced by integer overflow while converting 64-bit integer or unsigned 32-bit integer from HDF5 to a 32-bit integer in R. Choose bit64conversion='bit64' or bit64conversion='double' to avoid data loss and see the vignette 'rhdf5' for more details about 64-bit integers.
and I can see that one of the time values became NA:
> frame
X2 time
1 23.88 1234567001
2 23.96 NA
How can I fix this issue? Choosing bit64conversion='bit64'
or bit64conversion='double'
doesn't change anything.
> R.version
_
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 4.0
year 2017
month 04
day 21
svn rev 72570
language R
version.string R version 3.4.0 (2017-04-21)
nickname You Stupid Darkness
Upvotes: 2
Views: 438
Reputation: 83147
HDF5 Dataset Interface's documentation says:
bit64conversion: Defines, how 64-bit integers are converted. Internally, R does not support 64-bit integers. All integers in R are 32-bit integers. By setting bit64conversion='int', a coercing to 32-bit integers is enforced, with the risc of data loss, but with the insurance that numbers are represented as integers. bit64conversion='double' coerces the 64-bit integers to floating point numbers. doubles can represent integers with up to 54-bits, but they are not represented as integer values anymore. For larger numbers there is again a data loss. bit64conversion='bit64' is recommended way of coercing. It represents the 64-bit integers as objects of class 'integer64' as defined in the package 'bit64'. Make sure that you have installed 'bit64'. The datatype 'integer64' is not part of base R, but defined in an external package. This can produce unexpected behaviour when working with the data.
You should therefore install bit64 (install.packages("bit64")
) and load it (library(bit64)
). You can check that integer64
is loaded:
> integer64
Function (length = 0)
{
ret <- double(length)
oldClass(ret) <- "integer64"
ret
}
<bytecode: 0x000000001a7a95f0>
<environment: namespace :it64>
Now you can run:
library(bit64)
library(rhdf5)
loadhdf5data <- function(h5File) {
listing <- h5ls(h5File)
# Find all data nodes, values are stored in *_values and corresponding column
# titles in *_items
data_nodes <- grep("_values", listing$name)
name_nodes <- grep("_items", listing$name)
data_paths = paste(listing$group[data_nodes], listing$name[data_nodes], sep = "/")
name_paths = paste(listing$group[name_nodes], listing$name[name_nodes], sep = "/")
columns = list()
for (idx in seq(data_paths)) {
print(idx)
data <- data.frame(t(h5read(h5File, data_paths[idx], bit64conversion='bit64')))
names <- t(h5read(h5File, name_paths[idx], bit64conversion='bit64'))
entry <- data.frame(data)
colnames(entry) <- names
columns <- append(columns, entry)
}
data <- data.frame(columns)
return(data)
}
frame = loadhdf5data("a.hdf5")
which gives:
> frame
X2 time
1 23.88 1234567001
2 23.96 1234515616515167005
Upvotes: 1