Remove duplicates conditional on dummy variable

Question

I have a data frame with partly duplicated IDs and a dummy variable. I want to remove all rows with duplicate IDs, but only if the dummy variable is equal to 1.

Consider the following example data:

# Example data
df <- data.frame(id = c(1, 2, 3, 3, 4, 4, 5, 5, 5),
                 values = c(0, 1, 1, 0, 0, 0, 1, 0, 0))

The output should look as follows

# Expected output
df_expected <- df[- c(3, 7), ]

How coul I remove all duplicates where values is equal to 1?

akrun · Accepted Answer

We can create a logical condition with duplicated on the 'id' column and those where 'values' is 1

i1 <- (duplicated(df$id)|duplicated(df$id, fromLast = TRUE)) & df$values == 1
df[!i1, ]
#  id values
#1  1      0
#2  2      1
#4  3      0
#5  4      0
#6  4      0
#8  5      0
#9  5      0

or with filter from dplyr

library(dplyr)
df %>% 
   group_by(id) %>% 
   filter(!values|n() == 1)

Remove duplicates conditional on dummy variable

Answers (1)

Related Questions