Reputation: 987
I have a data frame in R that I am passing to H2O using the as.h2o()
.
dataset.h2o <- as.h2o(dataset,destination_frame = "dataset.h2o")
Doing an str()
on the data frame, we can see that the week_of_date class is of datatype Date
$ primary_account_id : int 31 31 31 31 31 31 31 31 31 31 ...
$ week_of_date : Date, format: "2015-08-31" "2015-09-07" "2015-09-14" "2015-09-21" ...
However, when viewed in H2O Flow, it seems to be converted to a datatype called time - which is of the format
week_of_date time 0 0 0 0 1440943200000.0 1447592400000.0 1444480409625.8884 2013362534.5706
When I bring back the data to R using as.data.frame
returned.dataset <- as.data.frame(dataset.h2o)
it is stored in a format that I am unable to understand and therefore parse back
$ primary_account_id: int 31 31 698 1060 1060 1060 1060 1060 1060 1133 ...
$ week_of_date :Class 'POSIXct' num [1:194] 1442757600000 1446382800000 1446382800000 1442152800000 1442757600000 ...
Could you please point me in the direction of how I can achieve better interoperability with dates between R and H2O?
Thanks!
Upvotes: 3
Views: 3386
Reputation: 6222
Converting to H2o
and back is easy if the date-time columns are in the proper format. (Accuracy of times in milliseconds cab be lost). As mentioned in the H20 FAQ
H2O is set to auto-detect two major date/time formats. The first format is for dates formatted as yyyy-MM-dd. ... The second date format is for dates formatted as dd-MMM-yy.
Times are specified as HH:mm:ss. HH is a two-digit hour and must be a value between 0-23 (for 24-hour time) or 1-12 (for a twelve-hour clock). mm is a two-digit minute value and must be a value between 0-59. ss is a two-digit second value and must be a value between 0-59.
Example
Example Data
dates <- c("02/27/92", "02/27/92", "01/14/92", "02/28/92", "02/01/92")
times <- c("23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26")
x <- paste(dates, times)
df <- data.frame(datetime = strptime(x, "%m/%d/%y %H:%M:%S"))
# > df
# datetime
# 1 1992-02-27 23:03:20
# 2 1992-02-27 22:29:56
# 3 1992-01-14 01:03:30
# 4 1992-02-28 18:21:03
# 5 1992-02-01 16:56:26
Change the format to one that H2o prefers
# Change format
df$datetime <- format(df$datetime, format = "%Y-%m-%d %H:%M:%S")
#H2o format
h2o_df <- as.h2o(df)
# Convert back
back_df <- as.data.frame(h2o_df)
back_df
# datetime
# 1 1992-02-27 23:03:20
# 2 1992-02-27 22:29:56
# 3 1992-01-14 01:03:30
# 4 1992-02-28 18:21:03
# 5 1992-02-01 16:56:26
Upvotes: 0
Reputation: 724
Both answers above are great. However, my workaround which I deem more efficient would be to pass the dataset to h2o excluding the date column. Then when you train a model and then make predictions, these would have the same amount of fields/rows as that of the original dataset for which you could just attach the Date column to the predictions vector or matrix.
Of course, the predictions in this solutions is related to the period as for backtesting.
Upvotes: 0
Reputation: 2150
Refer to the response by phiver for a more detailed answer, but another simple workaround would be to convert the date columns to character before passing to H2O (if you do not need the column in a date format in H2O). Here is a simple example.
# construct a sample df with a date format column
df <- data.frame(week_of_date = as.Date(c('2015-09-29','2015-10-05')))
str(df$week_of_date)
Date[1:2], format: "2015-09-29" "2015-10-05"
# convert the column to H2O
df$week_of_date <- as.character(df$week_of_date)
str(df$week_of_date)
chr [1:2] "2015-09-29" "2015-10-05"
# convert to H2OFRAME and pass back to R data.frame and re-convert to date
df.hex <- as.h2o(df)
df2 <- as.data.frame(df.hex)
df2$week_of_date <- as.Date(df2$week_of_date)
str(df2$week_of_date)
Date[1:2], format: "2015-09-29" "2015-10-05"
Upvotes: 0
Reputation: 23608
It is a bug in h2o. H2o returns date time in milliseconds while R expects seconds. See jira issue 3434.
What you can do in the meantime is recode the date column:
as.Date(structure(returned.dataset$week_of_date/1000, class = c("POSIXct", "POSIXt")))
Upvotes: 2