Reputation: 77
I use the xgboost
function in R, and I get the following error message
bst <- xgboost(data = germanvar, label = train$Creditability, max.depth = 2, eta = 1,nround = 2, objective = "binary:logistic")
Error in xgb.get.DMatrix(data, label, missing, weight) :
xgboost only support numerical matrix input,
use 'data.matrix' to transform the data.
In addition: Warning message:
In xgb.get.DMatrix(data, label, missing, weight) :
xgboost: label will be ignored.
Following is my full code.
credit<-read.csv("http://freakonometrics.free.fr/german_credit.csv", header=TRUE)
library(caret)
set.seed(1000)
intrain<-createDataPartition(y=credit$Creditability, p=0.7, list=FALSE)
train<-credit[intrain, ]
test<-credit[-intrain, ]
germanvar<-train[,2:21]
str(germanvar)
bst <- xgboost(data = germanvar, label = train$Creditability, max.depth = 2, eta = 1,
nround = 2, objective = "binary:logistic")
Data has a mixture of continuous and categorical variables.
However, because of the error message that only continuous variables can be used, all the variables were recognized as continuous, but the error message reappears.
How can I solve this problem???
Upvotes: 4
Views: 7986
Reputation: 1
I got the following error message: #Error in xgb.DMatrix(as.matrix(trainX), label = trainY$myvar): #REAL() can only be applied to a 'numeric', not a 'logical'
and it turned out that in the step creating my DMatrix, accidentially my dataframe had 0 rows.
So it can be worth checking for number of rows in the dataframe before creating the DMatrix.
Upvotes: 0
Reputation: 4844
So if you have categorical variables that are represented as numbers, it is not an ideal representation. But with deep enough trees you can get away with it. The trees will partition it eventually. I don't prefer that approach but it keeps you columns minimal, and can succeed given the right setup.
Note that xgboost
takes numeric matrix
as data, and numeric
vector as label
.
NOT INTEGERS :)
The following code will train with the inputs cast properly
credit<-read.csv("http://freakonometrics.free.fr/german_credit.csv", header=TRUE)
library(caret)
set.seed(1000)
intrain<-createDataPartition(y=credit$Creditability, p=0.7, list=FALSE)
train<-credit[intrain, ]
test<-credit[-intrain, ]
germanvar<-train[,2:21]
label <- as.numeric(train$Creditability) ## make it a numeric NOT integer
data <- as.matrix(germanvar) # to matrix
mode(data) <- 'double' # to numeric i.e double precision
bst <- xgboost(data = data, label = label, max.depth = 2, eta = 1,
nround = 2, objective = "binary:logistic")
Upvotes: 7