John Richardson
John Richardson

Reputation: 696

Use Factor Vector to Lookup Value in Data Frame

I have a vector

> head(gbmPred)
[1] COMPLETED DEAD COMPLETED COMPLETED COMPLETED LOW

I also have a data frame

> head(gbmPredProb)
  COLLECTION COMPLETED       DEAD        LOW
1 0.04535981 0.8639282 0.07698963 0.01372232
2 0.19031127 0.6680874 0.11708416 0.02451713
3 0.25004446 0.6789679 0.04827067 0.02271702
4 0.09625138 0.7877128 0.09906595 0.01696983
5 0.15696875 0.7617585 0.04441733 0.03685539
6 0.14157307 0.7690410 0.06057754 0.02880836

I want to be create a vector by using the levels in gbmPred to lookup the values in gbmPredProb:

0.8639282 0.1170841 0.6789679 0.7877128 0.7617585 0.02880836

Does anyone know how to do this in R? Appreciate the help.

EDIT *** Sorry copy and paste error. Fixed above The first value .86 matches COMPLETED the second value .11 matches DEAD

WHat I am looking for is to loop through the vector gbmPred to get the value (COMPLETED,etc), then search gbmPredProb data frame for the value matching the column with the same name as well as the index of the vector.

So, the first value is COMPLETED. Look at gbmPredProb and get .863 The second value of gbmPred is DEAD. Look at gbmPredProb and get .117 the thrid value of gbmPred is COMPLETED. Look at gbmPredProb and get .678

Upvotes: 0

Views: 235

Answers (1)

josliber
josliber

Reputation: 44320

If you have a bunch of (row, col) pairs that you want to grab out of a matrix, a good way to get them is to index by a 2-column matrix where the first column is all the row numbers of the elements you want and the second column is all the column numbers of the elements you want:

gbmPredProb[cbind(1:length(gbmPred), match(gbmPred, names(gbmPredProb)))]
# [1] 0.86392820 0.11708416 0.67896790 0.78771280 0.76175850
# [6] 0.02880836

One advantage of this sort of an approach is that it will be a good deal quicker than a row-by-row approach on larger data frames:

gbmPredProb <- gbmPredProb[rep(1:6, each=1000),]  # 6000x4
gbmPred <- rep(gbmPred, each=1000)  # Length 6000
josilber <- function(mat, vec) mat[cbind(1:length(vec), match(vec, names(mat)))]
rscriven <- function(mat, vec) sapply(seq_along(vec), function(i) mat[i, as.character(vec[i])])
all.equal(josilber(gbmPredProb, gbmPred), rscriven(gbmPredProb, gbmPred))
# [1] TRUE
library(microbenchmark)
microbenchmark(josilber(gbmPredProb, gbmPred), rscriven(gbmPredProb, gbmPred))
# Unit: microseconds
#                            expr       min          lq     median         uq        max neval
#  josilber(gbmPredProb, gbmPred)   328.524    398.8545    442.065    512.949    766.082   100
#  rscriven(gbmPredProb, gbmPred) 97843.015 111478.4360 117294.079 123901.368 254645.966   100

Upvotes: 4

Related Questions