Reputation: 159
I have been trying for a while now to solve a problem close to the one as presented at this issue with no success. This consists in filtering for items that are duplicated in a group, but also considering the original one used for comparison with dplyr (I prefer dplyr over base or data.table).
The solution I tried is as follows:
> a <- data.frame(name=c("a","b","b","b","a","a"),position=c(1,2,1,2,2,2),achieved=c(1,0,0,0,1,0))
> a %>% group_by(name,achieved) %>% mutate(duplicated=duplicated(position))
# A tibble: 6 x 4
# Groups: name, achieved [3]
name position achieved duplicated
<fct> <dbl> <dbl> <lgl>
1 a 1 1 FALSE
2 b 2 0 FALSE
3 b 1 0 FALSE
4 b 2 0 TRUE
5 a 2 1 FALSE
6 a 2 0 FALSE
I know that this solution is close to the one I desire, but it only brings me the values that are duplicated after the first one, but I would also like a dplyr solution that gives me all duplicate values per group, so probably this could help me improve my dplyr understanding.
The desired output would be as follows:
# A tibble: 6 x 4
# Groups: name, achieved [3]
name position achieved duplicated
<fct> <dbl> <dbl> <lgl>
1 a 1 1 FALSE
2 b 2 0 TRUE
3 b 1 0 FALSE
4 b 2 0 TRUE
5 a 2 1 FALSE
6 a 2 0 FALSE
Thanks in advance.
Upvotes: 2
Views: 2794
Reputation: 206232
It seems like you want to group by all of name, position, and acheived and then just see if there are more than one record in that group
a %>% group_by(name,achieved, position) %>% mutate(duplicated = n()>1)
# name position achieved duplicated
# <fct> <dbl> <dbl> <lgl>
# 1 a 1 1 FALSE
# 2 b 2 0 TRUE
# 3 b 1 0 FALSE
# 4 b 2 0 TRUE
# 5 a 2 1 FALSE
# 6 a 2 0 FALSE
Upvotes: 3
Reputation: 14764
Try this:
a %>%
group_by_all() %>%
mutate(duplicated = n() > 1)
Upvotes: 2