Reputation: 9793
df <- data.frame(id = c(1, 2, 3, 3, 3, 4), gender = c("Male", "Female", "Both", "Male", "Female", "Female"))
ids <- unique(df$id)
> df
id gender
1 1 Male
2 2 Female
3 3 Both
4 3 Male
5 3 Female
6 4 Female
For each unique id
, I want to make sure that if the corresponding gender
s are Both
, Male
and Female
, then I need to remove the row corresponding to Both
. In other words, my desired output is:
> df
id gender
1 1 Male
2 2 Female
3 3 Male
4 3 Female
5 4 Female
I've tried writing a loop:
subset the df
by id
and store each subset into a list called sub
within each sub
, check if the genders contain "Both", "Male" and "Female`
if so, remove the row with gender = "Both"
re-combine the data.frame
However, the following code does not really work and is very clunky...I'm wondering if there's an easier way by using group_by
in dplyr
?
sub <- list()
for(i in 1:length(ids)){
sub[[i]] <- subset(df, id %in% ids[i])
if(all(grepl(sub[[i]]$gender, c("Both", "Male", "Female")))){
sub[[i]] <- sub[[i]][-which(sub[[i]]$gender == "Both"), ]
}else sub[[i]] = sub[[i]]
}
Upvotes: 0
Views: 197
Reputation: 1463
In addition to the tidyverse solution, here a solution using lapply
:
result <- lapply(ids,function(x){
tmp <- df[df$id == x,]
if(all(c("Both","Male", "Female") %in% tmp$gender)){
tmp <- tmp[tmp$gender != "Both",]
}
return(tmp)
})
do.call("rbind",result)
# id gender
# 1 1 Male
# 2 2 Female
# 4 3 Male
# 5 3 Female
# 6 4 Female
Upvotes: 0
Reputation: 323236
By using dplyr
df %>%
group_by(id) %>%
mutate(A = ifelse(length(unique(gender)) >= 3 & gender == 'Both', F, T)) %>%
filter(A) %>%
select(-A)
# A tibble: 5 x 2
# Groups: id [4]
id gender
<dbl> <chr>
1 1 Male
2 2 Female
3 3 Male
4 3 Female
5 4 Female
Upvotes: 2