Reputation: 245
I would like to remove duplicate IDs
in my data using the Category
columns. A subset of my data is as follows:
df <- data.frame(ID=c(1,2,3,4,1,4,2),
category=c("a","b","c","d","b","a","a"))
df
ID category
1 1 a
2 2 b
3 3 c
4 4 d
5 1 b
6 4 a
7 2 a
If there is a duplicated ID
from Category b
, I need to keep it and remove the corresponding ID from other categories. And, I have no priority if the duplicated IDs
are form other categories excluding Category b
. So, my favorite outcome is:
ID category
1 2 b
2 3 c
3 4 d
4 1 b
I have already read this post : R: Remove duplicates from a dataframe based on categories in a column but can't find my answer
Upvotes: 2
Views: 411
Reputation: 886938
We could do an arrange
to that 'b' category rows are arranged at the top and then get the distinct
rows by 'ID'
library(dplyr)
df %>%
arrange(category != 'b') %>%
distinct(ID, .keep_all = TRUE)
-output
ID category
1 2 b
2 1 b
3 3 c
4 4 d
Or using base R
df[order(df$category != 'b'), ] -> df1
df1[!duplicated(df1$ID), ]
Upvotes: 1
Reputation: 79188
In base R you could do:
subset(df, !category %in% category[ID %in% ID[category == 'b'] & category !='b'])
ID category
1 2 b
2 3 c
3 4 d
4 1 b
Upvotes: 0