Reputation: 774
I'm trying to create a function to linearly spline a variable in an h2o dataset, but can't get h2o to evaluate the function properly.
Here's my initial attempt on intermediate spline:
df <- data.frame( AGE = sample(1:100, 1e6, replace = TRUE))
df_A.hex <- as.h2o( df, 'df_A.hex' )
simple_spline <- function( x, L, U ) min( max(x-L,0), U-L)
spline_vector <- Vectorize( simple_spline, vectorize.args = 'x', USE.NAMES = FALSE )
df_A.hex[, 'AGE_12_24'] <- spline_vector( df_A.hex[, 'AGE'], 12, 24)
And here is the result:
AGE AGE_12_24
1 9 12
2 7 12
3 33 12
4 84 12
5 86 12
6 25 12
I tried using pmin
and pmax
, on the assumption that maybe it wasn't vectorizing the columns, but I get the following error:
> simple_spline <- function( x, L, U ) pmin( pmax(x-L,0), U-L)
> df_A.hex[, 'AGE_12_24'] <- simple_spline( df_A.hex[, 'AGE'], 12, 24)
Error in each[change] : invalid subscript type 'environment'
I'm guessing it's because the pmin
and pmax
aren't implemented in h2o?
I also tried using apply, but also hit an error:
> simple_spline <- function( x, L, U ) min( max(x-L,0), U-L)
> df_A.hex[, 'AGE_12_24'] <- apply( df_A.hex[, 'AGE'], 1, simple_spline, 12, 24)
> [1] "Lookup failed to find min"
Error in .process.stmnt(stmnt, formalz, envs) :
Don't know what to do with statement: min
I could write a function that iteratively overwrites the spline column like so:
df_A.hex[, 'AGE_12_24'] <- df_A.hex[, 'AGE'] - 12
df_A.hex[, 'AGE_12_24'] <- h2o.ifelse( df_A.hex[, 'AGE_12_24'] < 0, 0, df_A.hex[, 'AGE_12_24'] )
df_A.hex[, 'AGE_12_24'] <- h2o.ifelse( df_A.hex[, 'AGE_12_24'] > 12, 12, df_A.hex[, 'AGE_12_24'] )
This gets me my expected result:
AGE AGE_12_24
1 9 0
2 7 0
3 33 12
4 84 12
5 86 12
6 25 12
But it's a fairly ugly way of getting there. I'd like to know what I'm doing wrong and how to have a function pass on the values to the h2o frame.
Upvotes: 3
Views: 139
Reputation: 5778
Unfortunately you can't pass additional parameters to the H2O R apply()
method (I've reported the bug here).
and even if you hardcode the original parameters to get the apply
method to evaluate it, it won't evaluate correctly:
library(h2o)
h2o.init()
df <- data.frame( AGE = c(9,7,33,84,86,25))
df_A.hex <- as.h2o( df, 'df_A.hex' )
L = 12
U = 24
simple_spline <- function(x) { min( max(x-L,0), U-L )}
apply(df_A.hex, 1, simple_spline)
C1
1 -3
2 -5
3 21
4 72
5 74
6 13
I think your best bet is to use your iterative method, or play around with the apply method (not passing additional parameters) until you can trust the results you see.
Upvotes: 4