abu
abu

Reputation: 777

Error while estimating xgboost in h2o after update to 3.18

I encountered the known issue of not being able to save the xgboost model and load it later to obtain predictions and it was supposedly changed in h2o 3.18 (the problem was in 3.16). I updated the package from h2o's website (downloadable zip) and now the model that had no problem gives the following error:

Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = urlSuffix,  : 
  Unexpected CURL error: Failed to connect to localhost port 54321: Connection refused

This is only in the case of xgboost (binary classification), as other models I use work fine. Of course h2o is initialised and a previous model estimates without problems. Does anyone have any idea what can be the issue?

EDIT: Here is a reproducible example (based on Erin's answer) that produces the error:

library(h2o)
library(caret)
h2o.init()

# Import a sample binary outcome train set into H2O
train <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")

# Identify predictors and response
y <- "response"
x <- setdiff(names(train), y)

# Assigning fold column
set.seed(1)
cv_folds <- createFolds(as.data.frame(train)$response,
                        k = 5,
                        list = FALSE,
                        returnTrain = FALSE)

# version 1
train <- train %>%
    as.data.frame() %>% 
    mutate(fold_assignment = cv_folds) %>%
    as.h2o()

# version 2
train <- h2o.cbind(train, as.h2o(cv_folds))
names(train)[dim(train)[2]] <- c("fold_assignment")


# For binary classification, response should be a factor
train[,y] <- as.factor(train[,y])

xgb <- h2o.xgboost(x = x,
                   y = y, 
                   seed = 1,
                   training_frame = train,
                   fold_column = "fold_assignment",
                   keep_cross_validation_predictions = TRUE,
                   eta = 0.01,
                   max_depth = 3,
                   sample_rate = 0.8,
                   col_sample_rate = 0.6,
                   ntrees = 500,
                   reg_lambda = 0,
                   reg_alpha = 1000,
                   distribution = 'bernoulli') 

Both versions of creating the train data.frame result in the same error.

Upvotes: 0

Views: 1019

Answers (1)

Erin LeDell
Erin LeDell

Reputation: 8819

You didn't say whether you have re-trained the models using 3.18. In general, H2O only guarantees model compatibility between major version of H2O. If you have not retrained the models, that's probably the reason that XGBoost is not working properly. If you have re-trained the models with 3.18 and XGBoost is still not working, then please post a reproducible example and we will check it out further.

EDIT: I am adding reproducible example (the only difference from your code and this code is that I am not using fold_column here). This runs fine on 3.18.0.2. Without a reproducible example that produces an error, I can't help you any further.

library(h2o)
h2o.init()

# Import a sample binary outcome train set into H2O
train <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")

# Identify predictors and response
y <- "response"
x <- setdiff(names(train), y)

# For binary classification, response should be a factor
train[,y] <- as.factor(train[,y])

xgb <- h2o.xgboost(x = x,
                   y = y, 
                   seed = 1,
                   training_frame = train,
                   keep_cross_validation_predictions = TRUE,
                   eta = 0.01,
                   max_depth = 3,
                   sample_rate = 0.8,
                   col_sample_rate = 0.6,
                   ntrees = 500,
                   reg_lambda = 0,
                   reg_alpha = 1000,
                   distribution = 'bernoulli') 

Upvotes: 1

Related Questions