Tathagata
Tathagata

Reputation: 2065

New subset by selecting rows based on values of a vector in R

I have a data set U1 over which I run a classifier and get a vector of labels

pred.U1.nb.c <- predict(NB.C, U1[,2:6])
table(pred.U1.nb.c)
pred.U1.nb.c
    S unlabeled 
  148      5852 
> head(pred.U1.nb.c)
  [1] S S S S S S
  Levels: S unlabeled

Now I want to pull out those rows of U1 which were classified as S in U1.S. What is the most efficient way to do this?

Upvotes: 4

Views: 2914

Answers (2)

IRTFM
IRTFM

Reputation: 263331

The answer by James has elegant economy going for it and would certainly work correctly with this example, but it is prone to undesirable results if the tested vector has any NA's. (I have been bitten many times and been puzzled.) Here are two safer ways that avoid the NA -inclusive behavior of the "[" function:

U1[which(pred.U1.nb.c=="S"), ]

This converts the logical vector (possibly with NA's) into a numerical vector with no NA's. Can also use subset:

subset(U1 ,pred.U1.nb.c=="S")

EDIT: I suspect that using grepl would also avoid the NA concern. Perhaps:

U1[grepl("^S$", pred.U1.nb.c), ]

Upvotes: 11

James
James

Reputation: 66834

U1[pred.U1.nb.c=="S",]

Upvotes: 3

Related Questions