Reputation: 25
I have encountered a problem when I try to clean up a data frame in R, and I have been unable to find the answer anywhere.
I have a data frame that looks like this:
df <- data.frame(ID = c(1,1,1,1,2,2,2,2,3,3,3),
A = c("yes","yes",NA,NA,"yes","no",NA,NA,"no","no",NA,NA,"yes",NA,NA),
B = c("yes","no",NA,NA,"no","no",NA,NA,"no","yes",NA,NA,"yes",NA,NA),
time = c(0,1,0,1,0,1,0,1,0,0,1))
df
ID A B time
1 1 yes yes 0
2 1 yes no 1
3 1 <NA> <NA> 0
4 1 <NA> <NA> 1
5 2 yes no 0
6 2 no no 1
7 2 <NA> <NA> 0
8 2 <NA> <NA> 1
9 3 no no 0
10 3 <NA> <NA> 0
11 3 <NA> <NA> 1
What I want is this:
df
ID A B time
1 1 yes yes 0
2 1 yes no 1
5 2 yes no 0
6 2 no no 1
9 3 no no 0
11 3 <NA> <NA> 1
So that every ID appers twice at time 0 and 1. I cannot use something like df[complete.cases(df[,2]), ]
as this removes row 10 and unique(df)
or duplicated()
just leaves the data frame mostly unchanged.
Any ideas?
Upvotes: 0
Views: 89
Reputation: 33488
df[!duplicated(paste(df$ID, df$time)),]
# ID A B time
# 1 1 yes yes 0
# 2 1 yes no 1
# 5 2 yes no 0
# 6 2 no no 1
# 9 3 no no 0
# 11 3 <NA> <NA> 1
Reproducible data
df <- data.frame(
ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3),
A = c("yes", "yes", NA, NA, "yes", "no", NA, NA, "no", "no", NA),
B = c("yes", "no", NA, NA, "no", "no", NA, NA, "no", "yes", NA),
time = c(0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1)
)
Upvotes: 1
Reputation: 388982
We can use match
:
library(dplyr)
df %>% group_by(ID) %>% slice(match(c(0, 1), time))
# ID A B time
# <int> <chr> <chr> <int>
#1 1 yes yes 0
#2 1 yes no 1
#3 2 yes no 0
#4 2 no no 1
#5 3 no no 0
#6 3 <NA> <NA> 1
data
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L,
3L), A = c("yes", "yes", "<NA>", "<NA>", "yes", "no", "<NA>",
"<NA>", "no", "<NA>", "<NA>"), B = c("yes", "no", "<NA>", "<NA>",
"no", "no", "<NA>", "<NA>", "no", "<NA>", "<NA>"), time = c(0L,
1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L)),
class = "data.frame", row.names = c(NA, -11L))
Upvotes: 1