MrT
MrT

Reputation: 774

R function not evaluating properly on h2o dataset

I'm trying to create a function to linearly spline a variable in an h2o dataset, but can't get h2o to evaluate the function properly.

Here's my initial attempt on intermediate spline:

df <- data.frame( AGE = sample(1:100, 1e6, replace = TRUE))
df_A.hex <- as.h2o( df, 'df_A.hex' )

simple_spline <- function( x, L, U ) min( max(x-L,0), U-L)
spline_vector <- Vectorize( simple_spline, vectorize.args = 'x', USE.NAMES = FALSE )

df_A.hex[, 'AGE_12_24'] <- spline_vector( df_A.hex[, 'AGE'], 12, 24) 

And here is the result:

  AGE AGE_12_24
1   9        12
2   7        12
3  33        12
4  84        12
5  86        12
6  25        12

I tried using pmin and pmax, on the assumption that maybe it wasn't vectorizing the columns, but I get the following error:

> simple_spline <- function( x, L, U ) pmin( pmax(x-L,0), U-L)
> df_A.hex[, 'AGE_12_24'] <- simple_spline( df_A.hex[, 'AGE'], 12, 24) 
Error in each[change] : invalid subscript type 'environment'

I'm guessing it's because the pmin and pmax aren't implemented in h2o?

I also tried using apply, but also hit an error:

> simple_spline <- function( x, L, U ) min( max(x-L,0), U-L)
> df_A.hex[, 'AGE_12_24'] <- apply( df_A.hex[, 'AGE'], 1, simple_spline, 12, 24) 
> [1] "Lookup failed to find min"
Error in .process.stmnt(stmnt, formalz, envs) : 
  Don't know what to do with statement: min

I could write a function that iteratively overwrites the spline column like so:

df_A.hex[, 'AGE_12_24'] <- df_A.hex[, 'AGE'] - 12
df_A.hex[, 'AGE_12_24'] <- h2o.ifelse( df_A.hex[, 'AGE_12_24'] < 0, 0, df_A.hex[, 'AGE_12_24'] )
df_A.hex[, 'AGE_12_24'] <- h2o.ifelse( df_A.hex[, 'AGE_12_24'] > 12, 12, df_A.hex[, 'AGE_12_24'] )

This gets me my expected result:

  AGE AGE_12_24
1   9         0
2   7         0
3  33        12
4  84        12
5  86        12
6  25        12

But it's a fairly ugly way of getting there. I'd like to know what I'm doing wrong and how to have a function pass on the values to the h2o frame.

Upvotes: 3

Views: 139

Answers (1)

Lauren
Lauren

Reputation: 5778

Unfortunately you can't pass additional parameters to the H2O R apply() method (I've reported the bug here).

and even if you hardcode the original parameters to get the apply method to evaluate it, it won't evaluate correctly:

library(h2o)
h2o.init()
df <- data.frame( AGE = c(9,7,33,84,86,25))
df_A.hex <- as.h2o( df, 'df_A.hex' )
L = 12
U = 24
simple_spline <- function(x) { min( max(x-L,0), U-L )}
apply(df_A.hex, 1, simple_spline)

 C1
1 -3
2 -5
3 21
4 72
5 74
6 13

I think your best bet is to use your iterative method, or play around with the apply method (not passing additional parameters) until you can trust the results you see.

Upvotes: 4

Related Questions