Reputation: 696
I have a vector
> head(gbmPred)
[1] COMPLETED DEAD COMPLETED COMPLETED COMPLETED LOW
I also have a data frame
> head(gbmPredProb)
COLLECTION COMPLETED DEAD LOW
1 0.04535981 0.8639282 0.07698963 0.01372232
2 0.19031127 0.6680874 0.11708416 0.02451713
3 0.25004446 0.6789679 0.04827067 0.02271702
4 0.09625138 0.7877128 0.09906595 0.01696983
5 0.15696875 0.7617585 0.04441733 0.03685539
6 0.14157307 0.7690410 0.06057754 0.02880836
I want to be create a vector by using the levels in gbmPred to lookup the values in gbmPredProb:
0.8639282 0.1170841 0.6789679 0.7877128 0.7617585 0.02880836
Does anyone know how to do this in R? Appreciate the help.
EDIT *** Sorry copy and paste error. Fixed above The first value .86 matches COMPLETED the second value .11 matches DEAD
WHat I am looking for is to loop through the vector gbmPred to get the value (COMPLETED,etc), then search gbmPredProb data frame for the value matching the column with the same name as well as the index of the vector.
So, the first value is COMPLETED. Look at gbmPredProb and get .863 The second value of gbmPred is DEAD. Look at gbmPredProb and get .117 the thrid value of gbmPred is COMPLETED. Look at gbmPredProb and get .678
Upvotes: 0
Views: 235
Reputation: 44320
If you have a bunch of (row, col)
pairs that you want to grab out of a matrix, a good way to get them is to index by a 2-column matrix where the first column is all the row numbers of the elements you want and the second column is all the column numbers of the elements you want:
gbmPredProb[cbind(1:length(gbmPred), match(gbmPred, names(gbmPredProb)))]
# [1] 0.86392820 0.11708416 0.67896790 0.78771280 0.76175850
# [6] 0.02880836
One advantage of this sort of an approach is that it will be a good deal quicker than a row-by-row approach on larger data frames:
gbmPredProb <- gbmPredProb[rep(1:6, each=1000),] # 6000x4
gbmPred <- rep(gbmPred, each=1000) # Length 6000
josilber <- function(mat, vec) mat[cbind(1:length(vec), match(vec, names(mat)))]
rscriven <- function(mat, vec) sapply(seq_along(vec), function(i) mat[i, as.character(vec[i])])
all.equal(josilber(gbmPredProb, gbmPred), rscriven(gbmPredProb, gbmPred))
# [1] TRUE
library(microbenchmark)
microbenchmark(josilber(gbmPredProb, gbmPred), rscriven(gbmPredProb, gbmPred))
# Unit: microseconds
# expr min lq median uq max neval
# josilber(gbmPredProb, gbmPred) 328.524 398.8545 442.065 512.949 766.082 100
# rscriven(gbmPredProb, gbmPred) 97843.015 111478.4360 117294.079 123901.368 254645.966 100
Upvotes: 4