Reputation: 21204
In addition to predicting the class labels, is it possible to return the expectation of each observation in new data when predicting?
library(caret)
knnFit <- train(Species ~ ., data = iris, method = "knn",
trControl = trainControl(method = "cv", classProbs = TRUE))
x <- predict(knnFit, newdata = iris)
Returns a vector of the predicted classes.
str(x)
Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
If I want the probabilities:
x <- predict(knnFit, newdata = iris, type = "prob")
> head(x)
setosa versicolor virginica
1 1 0 0
2 1 0 0
3 1 0 0
4 1 0 0
5 1 0 0
6 1 0 0
Is it possible to have caret return both the predictions and the probabilities? I know I can calculate by taking max.col of probabilities version but I wondered if there's a built in way to get both?
Upvotes: 7
Views: 9837
Reputation: 729
Another way to solve this:
#Generate class probabilities
y_val_probs = model.predict(x_val,return_proba = True)
#Get the list of classes from the predictor
classes = predictor.preproc.get_classes()
#convert probabilites to classes
y_val_pred = [classes[np.argmax(pred)] for pred in y_val_probs]
Upvotes: 1
Reputation: 8377
I make my comment into an answer.
Once you generate your prediction table of probabilities, you don't actually need to run twice the prediction function to get the classes. You can ask to add the class column by applying a simple which.max
function (which runs fast imo). This will assign for each row the name of the column (one in the three c("setosa", "versicolor", "virginica")
) based on which probability is the highest.
You get this table with both informations, as requested:
library(dplyr)
predict(knnFit, newdata = iris, type = "prob") %>%
mutate('class'=names(.)[apply(., 1, which.max)])
# a random sample of the resulting table:
#### setosa versicolor virginica class
#### 18 1 0.0000000 0.0000000 setosa
#### 64 0 0.6666667 0.3333333 versicolor
#### 90 0 1.0000000 0.0000000 versicolor
#### 121 0 0.0000000 1.0000000 virginica
ps: this uses the piping operator from dplyr
or magrittr
packages. The dot .
indicates when you reuse the result from the previous instruction
Upvotes: 11