Reputation: 497
I have a dataset with 25000 rows and 761 columns, which includes one binary response column. My binary response had values '-1' and '1'. I was trying to run xgboost on it, and keep getting an error which says-
xg_base<-xgboost(data = features,label = output,objective="binary:logistic",eta=1,nthreads=2,nrounds = 10
, verbose = T, print.every.n = 5)
Error in xgb.iter.update(bst$handle, dtrain, i - 1, obj) :
label must be in [0,1] for logistic regression
I changed the levels of my response using the following command-
levels(output)[levels(output)=="-1"] <- "0"
I still keep getting the same error, and am not sure what exactly the issue is. One important point is that this is a rare event detection problem, with the proportion of positive cases being 1% of the total observations. Could that be the reason I'm getting the error?
Upvotes: 2
Views: 7505
Reputation: 11013
Just so this may help someone trying to convert a factor variable with levels 0 and 1 into labels for input to XGBoost, you need to be aware that you need to subtract 1 after converting to integer (or numeric):
> f <- as.factor(c(0, 1, 1, 0))
# XGBoost will not accept this for label
> as.integer(f)
[1] 1 2 2 1
# Correct label
> as.integer(f) - 1
[1] 0 1 1 0
Upvotes: 11
Reputation: 51
After you change the -1's to 0's, change output
from factor to numeric:
output <- as.numeric(levels(output))[output]
I don't think the fact that this is a rare event detection problem is related to the error.
Upvotes: 5