Reputation: 73
I would like to know how can I draw a ROC plot with R. I have created a logistic regression model with k-fold cross validation.
dt3
- main dataset
dt3Training
- training split made from main dataset
dt3Test
- test split made from main dataset
Below is the code that used for logistic regression:
ctrl<- trainControl (method="repeatedcv", number = 10, repeats =5, savePredictions="TRUE"
modelfit <- train (Attrition~., data=dt3, method="glm", family="binomial", trControl=ctrl)
pred = predict (modelfit, newdata=dt3Test)
confusionMatrix(data=pred, dt3Test$Attrition)
My problem is that, pred
does not show up as a prediction, instead it shows as a data table. Therefore below code gives an error.
perf1 <- performance(pred,"tpr","fpr")
plot(perf1)
I would be really grateful if you can help me with this.
UPDATE: After viewing k-fold cross validation - how to get the prediction automatically? I changed my code to below:
library("caret", lib.loc="~/R/win-library/3.4")
load (df) ## load main dataset "df"
tc <- trainControl("cv",10,savePred=T) ##create folds
(fit<-train(Attrition~.,data=df,method="glm",family="binomial",trControl=tc)) ##train model, predict Attrition with all other variables
I would like to try code below by Claus Wilke however I got confused as I only have my main data (df) and my model (fit).
data.frame(predictor = predict(fit, df),
known.truth = fit$Attrition,
model = "fit")
or
data.frame(predictor = predict(fit, tc),
known.truth = tc$Attrition,
model = "fit")
Sorry if I am asking a really stupid question, but I don't have much time left for my project to finish. And I don't have previous experience with R.
Upvotes: 1
Views: 8103
Reputation: 73
I found a way to plot a ROC curve - I will write down the code from very beginning - creating the model then the ROC curve:
Creating logistic regression with k folds:
library("caret", lib.loc="~/R/win-library/3.4")
load (df)
## load main dataset "df"
tc <- trainControl("cv",10,savePred=T)
##create folds
(fit<-train (Attrition~.,data=df,method="glm",family="binomial",trControl=tc))
##train model, predict Attrition with all other variables
For the ROC Curve:
library(ggplot2)
library(ROCR)
predict0 <- predict(fit, type = 'raw')
ROCRpred0 <- prediction(as.numeric(predict0),as.numeric(df$Attrition))
ROCRperf0<- performance(ROCRpred0, 'tpr', 'fpr')
plot(ROCRperf0, colorize=TRUE, text.adj=c(-0.2,1.7))
I could get a plot with this code, I hope I could help other people with the same problem.Sample ROC Curve - discrete values
Upvotes: 1
Reputation: 17790
Since you don't provide a reproducible example, I'll use a different dataset and model. For ggplot2, the package plotROC provides generic ROC plotting capabilities that work with any fitted model. You just need to place the known truth and your predicted probabilities (or other numerical predictor variable) into a data frame and then hand to the geom. Example follows.
library(MASS) # for Pima data sets
library(ggplot2)
library(plotROC)
# train model on training data
glm.out.train <- glm(type ~ npreg + glu + bp + bmi + age,
data = Pima.tr,
family = binomial)
# combine linear predictor and known truth for training and test datasets into one data frame
df <- rbind(data.frame(predictor = predict(glm.out.train, Pima.tr),
known.truth = Pima.tr$type,
model = "train"),
data.frame(predictor = predict(glm.out.train, Pima.te),
known.truth = Pima.te$type,
model = "test"))
# the aesthetic names are not the most intuitive
# `d` (disease) holds the known truth
# `m` (marker) holds the predictor values
ggplot(df, aes(d = known.truth, m = predictor, color = model)) +
geom_roc(n.cuts = 0)
Upvotes: 3