Reputation: 2147
I have a data frame with partly duplicated IDs and a dummy variable. I want to remove all rows with duplicate IDs, but only if the dummy variable is equal to 1.
Consider the following example data:
# Example data
df <- data.frame(id = c(1, 2, 3, 3, 4, 4, 5, 5, 5),
values = c(0, 1, 1, 0, 0, 0, 1, 0, 0))
The output should look as follows
# Expected output
df_expected <- df[- c(3, 7), ]
How coul I remove all duplicates where values
is equal to 1?
Upvotes: 1
Views: 228
Reputation: 887501
We can create a logical condition with duplicated
on the 'id' column and those where 'values' is 1
i1 <- (duplicated(df$id)|duplicated(df$id, fromLast = TRUE)) & df$values == 1
df[!i1, ]
# id values
#1 1 0
#2 2 1
#4 3 0
#5 4 0
#6 4 0
#8 5 0
#9 5 0
or with filter
from dplyr
library(dplyr)
df %>%
group_by(id) %>%
filter(!values|n() == 1)
Upvotes: 1