user2849910
user2849910

Reputation: 201

Keeping only certain rows of a data frame based on a set of values

I have a data frame with an ID column and a few columns for values. I would like to only keep certain rows of the data frame based on whether or not the value of ID at that row matches another set of values (for instance, called "keep").

For simplicity, here is an example:

df <- data.frame(ID = sample(rep(letters, each=3)), value = rnorm(n=26*3))
keep <- c("a", "d", "r", "x")

How can I create a new data frame consisting of rows that only have IDs that match those of keep? I can do this for just one letter by using the which() function, but with multiple letters I get warning messages and incorrect returns. I know I could run a for loop through the data frame and extrapolate that way, but I'm wondering if there is a more elegant and efficient way of going about this. Thanks in advance.

Upvotes: 15

Views: 97284

Answers (1)

Adrian
Adrian

Reputation: 3288

Try df[df$ID %in% keep, ] or subset(df, ID %in% keep) -- see the help page for sets.

Edit: Also, if this were for a single letter, you could write e.g. df[df$ID == "a", ] instead of using which().

Edit: the is.element function also works (see https://stackoverflow.com/a/19136456/610668):

> df <- data.frame(id=c("A", "B", "C"), x=c(5, 22, 88)) 
> df[df$id %in% c("B", "C"), ]
  id  x
2  B 22
3  C 88
> df[is.element(df$id, c("B", "C")), ]
  id  x
2  B 22
3  C 88

Upvotes: 31

Related Questions