Reputation: 77

xgboost error message about numerical variable and label

I use the xgboost function in R, and I get the following error message

bst <- xgboost(data = germanvar, label = train$Creditability, max.depth = 2, eta = 1,nround = 2, objective = "binary:logistic")

Error in xgb.get.DMatrix(data, label, missing, weight) : 
  xgboost only support numerical matrix input,
           use 'data.matrix' to transform the data.
In addition: Warning message:
In xgb.get.DMatrix(data, label, missing, weight) :
  xgboost: label will be ignored.

Following is my full code.

credit<-read.csv("http://freakonometrics.free.fr/german_credit.csv", header=TRUE)
library(caret)
set.seed(1000)
intrain<-createDataPartition(y=credit$Creditability, p=0.7, list=FALSE) 
train<-credit[intrain, ]
test<-credit[-intrain, ]


germanvar<-train[,2:21]
str(germanvar)
bst <- xgboost(data = germanvar, label = train$Creditability, max.depth = 2, eta = 1,
               nround = 2, objective = "binary:logistic")

Data has a mixture of continuous and categorical variables.

However, because of the error message that only continuous variables can be used, all the variables were recognized as continuous, but the error message reappears.

How can I solve this problem???

Upvotes: 4

Answers (2)

Tobias

Reputation: 1

I got the following error message: #Error in xgb.DMatrix(as.matrix(trainX), label = trainY$myvar): #REAL() can only be applied to a 'numeric', not a 'logical'

and it turned out that in the step creating my DMatrix, accidentially my dataframe had 0 rows.

So it can be worth checking for number of rows in the dataframe before creating the DMatrix.

Upvotes: 0

T. Scharf

Reputation: 4844

So if you have categorical variables that are represented as numbers, it is not an ideal representation. But with deep enough trees you can get away with it. The trees will partition it eventually. I don't prefer that approach but it keeps you columns minimal, and can succeed given the right setup.

Note that xgboost takes numeric matrix as data, and numeric vector as label.

NOT INTEGERS :)

The following code will train with the inputs cast properly

credit<-read.csv("http://freakonometrics.free.fr/german_credit.csv", header=TRUE)
library(caret)
set.seed(1000)
intrain<-createDataPartition(y=credit$Creditability, p=0.7, list=FALSE) 
train<-credit[intrain, ]
test<-credit[-intrain, ]


germanvar<-train[,2:21]
label <- as.numeric(train$Creditability) ## make it a numeric NOT integer
data <-  as.matrix(germanvar)  # to matrix
mode(data) <- 'double'  # to numeric i.e double precision


bst <- xgboost(data = data, label = label, max.depth = 2, eta = 1,
               nround = 2, objective = "binary:logistic")

Upvotes: 7

xgboost error message about numerical variable and label

Answers (2)

Related Questions