Reputation: 2022
I have a question similar to this (link) except that my question refers to the java tool 'h2o' and its connection to 'r'.
In particular I want to assign a "h2o" object to part of a vector (or structure or array. I want to loop through and store several of them without having to manually enumerate.
I tried the solution at the link but it does not work for 'h2o' objects.
Here is my longer code (warts and all):
#libraries
library(h2o) #for tree control
#specify data
mydata <- iris[iris$Species!="setosa",]
mydata$Species <- as.factor(as.character(mydata$Species))
#most informative variable is petal length
x1 <- mydata$Petal.Length
x2 <- mydata$Petal.Width
#build classes
C <- matrix(0,nrow=length(x1),ncol=1)
idx1 <- which(mydata$Species == "versicolor",arr.ind=T)
idx2 <- which(mydata$Species != "versicolor",arr.ind=T)
C[idx1] <- +1
C[idx2] <- 0
#start h2o
localH2O = h2o.init(nthreads = -1)
# Run regression GBM on iris.hex data
irisPath = system.file("extdata", "iris.csv", package="h2o")
iris.hex = h2o.uploadFile(localH2O, path = irisPath)
names(iris.hex) <- c("Sepal.Length",
"Sepal.Width",
"Petal.Length",
"Petal.Width",
"Species" )
iris2 <- iris
iris2$Species <- unclass(iris$Species)
iris2.hex <- as.h2o(iris2)
iris.hex$Species <- as.factor(iris2.hex$Species)
independent <- c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")
dependent <- "Species"
mare <- numeric()
mae <- matrix(1,nrow=10,ncol=1)
est2.h2o <- vector(mode="list", length=150)
for (i in 1:150){
est2.h2o[[i]] <- h2o.gbm(y = dependent,
x = independent,
training_frame = iris.hex,
distribution="AUTO",
ntrees = i, max_depth = 3, min_rows = 2,
learn_rate = 0.5)
pred <- h2o.predict(est2.h2o,newdata=iris.hex)
err <- iris2$Species-(as.data.frame(pred)$predict+1)
mae[i] <- mean(abs(err))
mare[i] <- mean(abs(err)/iris2$Species)
print(c(i,log10(mae[i])))
}
The error that I get is:
Error in paste0("Predictions/models/", object@model_id, "/frames/", newdata@frame_id) :
trying to get slot "model_id" from an object of a basic class ("list") with no slots
My intention is to have a list/structure/array of GBM's that I can then run predict against for the whole data-set, and cull the less informative ones. I'm trying to make a decent "random forest of gbt's" following the steps of Eugene Tuv. I don't have his code.
Questions:
Is there a proper way to pack the h2o gbm along with a few (hundred) of its buddies, into a single store in r?
If the referenced object is thrown away in java, making this sort of approach unfeasible, is there a feasible variation using the 'gbm' library? If I end up having to use gbm, what is the speed difference vs. h2o?
Upvotes: 0
Views: 903
Reputation: 2952
Without seeing the exact parameters you're using, My guess is that the problem is that you're using sapply
and not lapply
.
sapply
often attempts to simplify the result, which is good most of the time. But, if you want something that can contain any kind of object, then you want a list.
if we define paramListList
as a list, where each entry is a list containing your parameters for h2o.gbm:
Ex:
paramListList <- list(list(x = xVALUES1,
y = yVALUES1,
training_frame = tfVALUES1,
model_id = miVALUES1,
checkpoint = checkVALUES1),
list(x = xVALUES2,
y = yVALUES2,
training_frame = tfVALUES2,
model_id = miVALUES2,
checkpoint = checkVALUES2),
)
then you can do the following:
lapply(paramListList, function(paramlist) do.call(h2o.gbm, paramlist))
which will put all of your results in that one list
Upvotes: 1