R: how to remove data within a subset in a loop

Question

df <- data.frame(id = c(1, 2, 3, 3, 3, 4), gender = c("Male", "Female", "Both", "Male", "Female", "Female"))
ids <- unique(df$id)

> df
  id gender
1  1   Male
2  2 Female
3  3   Both
4  3   Male
5  3 Female
6  4 Female

For each unique id, I want to make sure that if the corresponding genders are Both, Male and Female, then I need to remove the row corresponding to Both. In other words, my desired output is:

> df
  id gender
1  1   Male
2  2 Female
3  3   Male
4  3 Female
5  4 Female

I've tried writing a loop:

subset the df by id and store each subset into a list called sub
within each sub, check if the genders contain "Both", "Male" and "Female`
if so, remove the row with gender = "Both"
re-combine the data.frame

However, the following code does not really work and is very clunky...I'm wondering if there's an easier way by using group_by in dplyr?

sub <- list()
for(i in 1:length(ids)){
  sub[[i]] <- subset(df, id %in% ids[i])
  if(all(grepl(sub[[i]]$gender, c("Both", "Male", "Female")))){
    sub[[i]] <- sub[[i]][-which(sub[[i]]$gender == "Both"), ]
  }else sub[[i]] = sub[[i]]
}

BENY · Accepted Answer

By using dplyr

df %>% 
    group_by(id) %>% 
    mutate(A = ifelse(length(unique(gender)) >= 3 & gender == 'Both', F, T)) %>% 
    filter(A) %>% 
    select(-A)
# A tibble: 5 x 2
# Groups:   id [4]
     id gender
    
1     1   Male
2     2 Female
3     3   Male
4     3 Female
5     4 Female

R: how to remove data within a subset in a loop

Answers (2)

Related Questions