Mads Lumholdt
Mads Lumholdt

Reputation: 25

How to drop rows with conditions in R

I have encountered a problem when I try to clean up a data frame in R, and I have been unable to find the answer anywhere.

I have a data frame that looks like this:

df <- data.frame(ID = c(1,1,1,1,2,2,2,2,3,3,3), 
                  A = c("yes","yes",NA,NA,"yes","no",NA,NA,"no","no",NA,NA,"yes",NA,NA),
                  B = c("yes","no",NA,NA,"no","no",NA,NA,"no","yes",NA,NA,"yes",NA,NA), 
               time = c(0,1,0,1,0,1,0,1,0,0,1))
df
   ID    A    B time
1   1  yes  yes    0
2   1  yes   no    1
3   1 <NA> <NA>    0
4   1 <NA> <NA>    1
5   2  yes   no    0
6   2   no   no    1
7   2 <NA> <NA>    0
8   2 <NA> <NA>    1
9   3   no   no    0
10  3 <NA> <NA>    0
11  3 <NA> <NA>    1

What I want is this:

df
   ID    A    B time
1   1  yes  yes    0
2   1  yes   no    1
5   2  yes   no    0
6   2   no   no    1
9   3   no   no    0
11  3 <NA> <NA>    1

So that every ID appers twice at time 0 and 1. I cannot use something like df[complete.cases(df[,2]), ] as this removes row 10 and unique(df) or duplicated() just leaves the data frame mostly unchanged.

Any ideas?

Upvotes: 0

Views: 89

Answers (2)

s_baldur
s_baldur

Reputation: 33488

df[!duplicated(paste(df$ID, df$time)),]

#    ID    A    B time
# 1   1  yes  yes    0
# 2   1  yes   no    1
# 5   2  yes   no    0
# 6   2   no   no    1
# 9   3   no   no    0
# 11  3 <NA> <NA>    1

Reproducible data

df <- data.frame(
  ID   = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3),  
  A    = c("yes", "yes", NA, NA, "yes", "no", NA, NA, "no", "no", NA), 
  B    = c("yes", "no", NA, NA, "no", "no", NA, NA, "no", "yes", NA),  
  time = c(0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1)
)

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

We can use match :

library(dplyr)
df %>% group_by(ID) %>% slice(match(c(0, 1), time))

#     ID A     B      time
#  <int> <chr> <chr> <int>
#1     1 yes   yes       0
#2     1 yes   no        1
#3     2 yes   no        0
#4     2 no    no        1
#5     3 no    no        0
#6     3 <NA>  <NA>      1

data

df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 
3L), A = c("yes", "yes", "<NA>", "<NA>", "yes", "no", "<NA>", 
"<NA>", "no", "<NA>", "<NA>"), B = c("yes", "no", "<NA>", "<NA>", 
"no", "no", "<NA>", "<NA>", "no", "<NA>", "<NA>"), time = c(0L, 
1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L)), 
class = "data.frame", row.names = c(NA, -11L))

Upvotes: 1

Related Questions