Reputation: 1
I am practicing my R programming skills using Kaggle data sets, and I could use some help. I am working on the Ghosts, Ghouls, and Goblins data set and the goal is to predict which type of monster each row represents based on a set of descriptive stats. I trained a multinomial logistic regression model using a training data set to get probability values for each of the 3 types, and now I just want to put the name of the monster in the last cell of each row in the test data set based on on the max probability from 3 columns in that row. Here is the head of my table: predProbs Table
What I have currently tried seems to populate every cell in the type column with the same value. How can I calculate the max probability within the columns "Ghost", "Ghoul", and "Goblin", get the column name of the column containing the max value, and then populate the last cell in every row (column name: type) with the name? I want to do this for every row in the test data set. This is what I am currently trying to do and then just cbind typesList with the whole list called predProbs.
for (i in nrow(predProbs)) {typesList = append(typesList, which.max(apply(predProbs[i,7:9], MARGIN = 2, max)))}
But this doesn't seem to be creating the vector that I need. Any thoughts? This is similar to this post: find max value in a row and update new column with the max column name But, unfortunately, I'm not very fluent in SQL yet so I'm not able to translate it to R.
Any help would be greatly appreciated. Thanks!
-Wes
Upvotes: 0
Views: 74
Reputation: 79188
You should think of something like this:
t(apply(predProbs,1,function(i)append(i,names(predProbs)[which.max(i)],length(i))))
Upvotes: 0