Reputation: 77
I used the gbm() function to create the model and I want to get the accuracy. Here is my code:
df<-read.csv("http://freakonometrics.free.fr/german_credit.csv", header=TRUE)
str(df)
F=c(1,2,4,5,7,8,9,10,11,12,13,15,16,17,18,19,20,21)
for(i in F) df[,i]=as.factor(df[,i])
library(caret)
set.seed(1000)
intrain<-createDataPartition(y=df$Creditability, p=0.7, list=FALSE)
train<-df[intrain, ]
test<-df[-intrain, ]
install.packages("gbm")
library("gbm")
df_boosting<-gbm(Creditability~.,distribution = "bernoulli", n.trees=100, verbose=TRUE, interaction.depth=4,
shrinkage=0.01, data=train)
summary(df_boosting)
yhat.boost<-predict (df_boosting ,newdata =test, n.trees=100)
mean((yhat.boost-test$Creditability)^2)
However, when using the summary function, an error appears. The error message is as follows.
Error in plot.window(xlim, ylim, log = log, ...) :
유한한 값들만이 'xlim'에 사용될 수 있습니다
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
And, When measuring the MSE with the mean function, the following error also appears:
Warning message:
In Ops.factor(yhat.boost, test$Creditability) :
요인(factors)에 대하여 의미있는 ‘-’가 아닙니다.
Do you know why these two errors appear? Thank you in advance.
Upvotes: 1
Views: 4648
Reputation: 24252
In your code the problem is in the definition of the (binary) response variable Creditability
. You declare it as factor
but gbm
needs a numerical response variable.
Here is the code:
df <- read.csv("http://freakonometrics.free.fr/german_credit.csv", header=TRUE)
F <- c(2,4,5,7,8,9,10,11,12,13,15,16,17,18,19,20,21)
for(i in F) df[,i]=as.factor(df[,i])
str(df)
Creditability
now is a binary numerical variable:
'data.frame': 1000 obs. of 21 variables:
$ Creditability : int 1 1 1 1 1 1 1 1 1 1 ...
$ Account.Balance : Factor w/ 4 levels "1","2","3","4": 1 1 2 1 1 1 1 1 4 2 ...
$ Duration.of.Credit..month. : int 18 9 12 12 12 10 8 6 18 24 ...
$ Payment.Status.of.Previous.Credit: Factor w/ 5 levels "0","1","2","3",..: 5 5 3 5 5 5 5 5 5 3 ...
$ Purpose : Factor w/ 10 levels "0","1","2","3",..: 3 1 9 1 1 1 1 1 4 4 ...
...
... and the remaining part of the code works nicely:
library(caret)
set.seed(1000)
intrain <- createDataPartition(y=df$Creditability, p=0.7, list=FALSE)
train <- df[intrain, ]
test <- df[-intrain, ]
library("gbm")
df_boosting <- gbm(Creditability~., distribution = "bernoulli",
n.trees=100, verbose=TRUE, interaction.depth=4,
shrinkage=0.01, data=train)
par(mar=c(3,14,1,1))
summary(df_boosting, las=2)
##########
var rel.inf
Account.Balance Account.Balance 36.8578980
Credit.Amount Credit.Amount 12.0691120
Duration.of.Credit..month. Duration.of.Credit..month. 10.5359895
Purpose Purpose 10.2691646
Payment.Status.of.Previous.Credit Payment.Status.of.Previous.Credit 9.1296524
Value.Savings.Stocks Value.Savings.Stocks 4.9620662
Instalment.per.cent Instalment.per.cent 3.3124252
...
##########
yhat.boost <- predict(df_boosting , newdata=test, n.trees=100)
mean((yhat.boost-test$Creditability)^2)
[1] 0.2719788
Hope this can help you.
Upvotes: 2