Gianluca
Gianluca

Reputation: 6657

R - what command to generate confusion matrix using as input results from rpart() and predict()?

What command should I use in R to perform a confusion matrix after having used rpart() and predict() commands to generate a prediction model?

# Grow tree
library(rpart)
fit <- rpart(activity ~ ., method="class", data=train.data)

printcp(fit) # display the results
plotcp(fit) # visualize cross-validation results
summary(fit) # detailed summary of splits

# Prune the tree (in my case is exactly the same as the initial model)
pfit <- prune(fit, cp=0.10) # from cptable
pfit <- prune(fit,cp=fit$cptable[which.min(fit$cptable[,"xerror"]),"CP"])

# Predict using the test dataset
pred1 <- predict(fit, test.data, type="class")

# Show re-substitution error
table(train.data$activity, predict(fit, type="class"))

# Accuracy rate
sum(test.data$activity==pred1)/length(pred1)

I would like to summarise in a clear way True Positives, False Negatives, False Positives and True Negatives. It would be great also to have in the same matrix Sensitivity, Specificity, Positive Predictive Value and Negative Predictive Value.

Relationships among terms Source: http://en.wikipedia.org/wiki/Sensitivity_and_specificity

Upvotes: 3

Views: 15856

Answers (1)

Gerry
Gerry

Reputation: 1323

Use the predict() method, with your fit and the original data frame, like so:

pred = predict(train.fit, newdata, type = "vector")
newdata$pred = as.vector(pred)
newdata$prediction = activities[newdata$pred]

tab = table (newdata$prediction, newdata$activity)
print(tab)

In the example above, the rpart model predicts an activity (a factor variable). pred is numeric, with values corresponding to the levels of the factor. activities = sort(unique(data$activity)) corresponds to the default factor mapping.

Upvotes: 1

Related Questions