EngrStudent
EngrStudent

Reputation: 2022

Output 'h2o' function results to a vector

I have a question similar to this (link) except that my question refers to the java tool 'h2o' and its connection to 'r'.

In particular I want to assign a "h2o" object to part of a vector (or structure or array. I want to loop through and store several of them without having to manually enumerate.

I tried the solution at the link but it does not work for 'h2o' objects.

Here is my longer code (warts and all):

#libraries
library(h2o)      #for tree control

#specify data
mydata <- iris[iris$Species!="setosa",]
mydata$Species <- as.factor(as.character(mydata$Species))

#most informative variable is petal length
x1 <- mydata$Petal.Length
x2 <- mydata$Petal.Width

#build classes
C <- matrix(0,nrow=length(x1),ncol=1)
idx1 <- which(mydata$Species == "versicolor",arr.ind=T)
idx2 <- which(mydata$Species != "versicolor",arr.ind=T)
C[idx1] <- +1
C[idx2] <- 0

#start h2o
localH2O = h2o.init(nthreads = -1)

# Run regression GBM on iris.hex data
irisPath = system.file("extdata", "iris.csv", package="h2o")
iris.hex = h2o.uploadFile(localH2O, path = irisPath)
names(iris.hex) <- c("Sepal.Length",
                     "Sepal.Width",
                     "Petal.Length",
                     "Petal.Width",
                     "Species" )

iris2 <- iris
iris2$Species <- unclass(iris$Species)
iris2.hex <- as.h2o(iris2)
iris.hex$Species <- as.factor(iris2.hex$Species)

independent <- c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")
dependent <- "Species"

mare <- numeric()
mae <- matrix(1,nrow=10,ncol=1)

est2.h2o <- vector(mode="list", length=150)

for (i in 1:150){

     est2.h2o[[i]] <- h2o.gbm(y = dependent, 
                         x = independent, 
                         training_frame = iris.hex,
                         distribution="AUTO",
                         ntrees = i, max_depth = 3, min_rows = 2,
                         learn_rate = 0.5)


     pred <- h2o.predict(est2.h2o,newdata=iris.hex)

     err <- iris2$Species-(as.data.frame(pred)$predict+1)

     mae[i] <- mean(abs(err))
     mare[i] <- mean(abs(err)/iris2$Species)

     print(c(i,log10(mae[i])))

}

The error that I get is:

Error in paste0("Predictions/models/", object@model_id, "/frames/", newdata@frame_id) : 
  trying to get slot "model_id" from an object of a basic class ("list") with no slots

My intention is to have a list/structure/array of GBM's that I can then run predict against for the whole data-set, and cull the less informative ones. I'm trying to make a decent "random forest of gbt's" following the steps of Eugene Tuv. I don't have his code.

Questions:
Is there a proper way to pack the h2o gbm along with a few (hundred) of its buddies, into a single store in r?

If the referenced object is thrown away in java, making this sort of approach unfeasible, is there a feasible variation using the 'gbm' library? If I end up having to use gbm, what is the speed difference vs. h2o?

Upvotes: 0

Views: 903

Answers (1)

Shape
Shape

Reputation: 2952

Without seeing the exact parameters you're using, My guess is that the problem is that you're using sapply and not lapply.

sapply often attempts to simplify the result, which is good most of the time. But, if you want something that can contain any kind of object, then you want a list.

if we define paramListList as a list, where each entry is a list containing your parameters for h2o.gbm:

Ex:

paramListList <- list(list(x = xVALUES1, 
                           y = yVALUES1, 
                           training_frame = tfVALUES1, 
                           model_id = miVALUES1, 
                           checkpoint = checkVALUES1),
                      list(x = xVALUES2, 
                           y = yVALUES2, 
                           training_frame = tfVALUES2, 
                           model_id = miVALUES2, 
                           checkpoint = checkVALUES2),
                     )

then you can do the following:

lapply(paramListList, function(paramlist) do.call(h2o.gbm, paramlist))

which will put all of your results in that one list

Upvotes: 1

Related Questions