Reputation: 117
I am attempting multiclass (specifically 4 classes) classificaiton in R using the xgboost package. I have been do binary classification for 2 classes but unable to make it work for 4. The issue I'm having is that the output of the predict function is only probabilities but not the actual class prediction i.e. 0-3.
prediction <- predict(xgboost.model, as.matrix(df.test[,1:(ncol(df.test)-1)]))
The final column is my target variable.
Expected
[1] 0 1 0 2 3 0 0 1
Actual
[1] 0.1940184 0.2905097 0.3002516 0.2152203 0.3094974 0.2442986 0.1251981 0.3210058
Upvotes: 2
Views: 1375
Reputation: 846
It is much easier if you use the apply
function as it saves you heaps of space and you can essentially write all in one line. apply
family of functions are very compute-frugal and save you a lot of time. These functions allow crossing the data in a number of ways and avoid explicit use of loop constructs, read here apply functions.
But to answer your question, you can do this as an alternative:
# Use the predicted label with the highest probability
prediction$label = apply(prediction,1,function(x) colnames(prediction)[which.max(x)])
This will find the maximum probability for each sample and assign the class with maximum probability to the column label
.
Upvotes: 0
Reputation: 117
For those wondering the resolution required me to iterate through each row of the df.test dataframe as it did not seem to work in bulk. Code is:
prediction <- data.frame()
for(l in 1:nrow(df.test)){
prediction1 <- predict(xgboost.model, as.matrix(df.test[l,1:(ncol(df.test)-1)])) %>% t() %>% as.data.frame()
colnames(prediction1) <- as.character(classes2)
prediction1$prediction <- names(prediction1)[apply(prediction1, 1, which.max)]
prediction <- rbind(prediction, prediction1)
}
pred.perc <- prediction %>% dplyr::select(-c(prediction))
prediction <- prediction %>% dplyr::select(prediction)
Upvotes: 1