Reputation: 789
If this is my dataset
Id Weight Category
1 10.2 Pre
1 12.1 Post
2 11.3 Post
3 12.9 Pre
4 10.3 Post
4 12.3 Pre
5 11.8 Pre
How Do I get rid of duplicate IDs that are also Category=Pre. My final expected dataset would be
Id Weight Category
1 12.1 Post
2 11.3 Post
3 12.9 Pre
4 10.3 Post
5 11.8 Pre
Upvotes: 2
Views: 225
Reputation: 887971
Using subset
from base R
subset(df[with(df, order(Id, Category == 'Pre')),], !duplicated(Id))
Id Weight Category
2 1 12.1 Post
3 2 11.3 Post
4 3 12.9 Pre
5 4 10.3 Post
7 5 11.8 Pre
df <- structure(list(Id = c(1L, 1L, 2L, 3L, 4L, 4L, 5L), Weight = c(10.2,
12.1, 11.3, 12.9, 10.3, 12.3, 11.8), Category = c("Pre", "Post",
"Post", "Pre", "Post", "Pre", "Pre")), class = "data.frame",
row.names = c(NA,
-7L))
Upvotes: 2
Reputation: 79311
We could use filter
after grouping and arranging using first()
as Post
comes before Pre
:
df %>%
group_by(Id) %>%
arrange(Id, Category) %>%
filter(Category ==first(Category))
output:
Id Weight Category
<int> <dbl> <chr>
1 1 12.1 Post
2 2 11.3 Post
3 3 12.9 Pre
4 4 10.3 Post
5 5 11.8 Pre
Upvotes: 2
Reputation: 73802
Using by
, split dat
by Id
and select Post
, then rbind
result.
do.call(rbind, by(dat, dat$Id, function(x)
if (nrow(x) == 2) x[x$Category == 'Post', ] else x))
# Id Weight Category
# 1 1 12.1 Post
# 2 2 11.3 Post
# 3 3 12.9 Pre
# 4 4 10.3 Post
# 5 5 11.8 Pre
Data:
dat <- read.table(header=T, text='
Id Weight Category
1 10.2 Pre
1 12.1 Post
2 11.3 Post
3 12.9 Pre
4 10.3 Post
4 12.3 Pre
5 11.8 Pre
')
Upvotes: 2
Reputation: 389325
You may arrange the data and then use distinct
.
library(dplyr)
df %>% arrange(Id, Category) %>% distinct(Id, .keep_all = TRUE)
# Id Weight Category
#1 1 12.1 Post
#2 2 11.3 Post
#3 3 12.9 Pre
#4 4 10.3 Post
#5 5 11.8 Pre
This works because 'Pre' > 'Post'
.
Upvotes: 3