Reputation: 935
I have a table that contains letters and numbers:
xx <- tibble (letter = c (rep ("a", 3), rep ("b", 3), rep ("c", 3)),
number = c (1, 2, 3, 1, 2, 3, 4, 5, 6))
I would like to first group data by "letter" and then check if there are any two groups that have identical values in number column. These would be groups with letters "a" and "b" in the "letter" column.
The result would look like this
xx <- tibble (letter = c (rep ("a", 3), rep ("b", 3), rep ("c", 3)),
number = c (1, 2, 3, 1, 2, 3, 4, 5, 6),
duplicated = c (rep (TRUE, 6), rep (FALSE, 3)) )
is there a way to do this elegantly in dplyr?
Upvotes: 5
Views: 1458
Reputation: 15143
You may try:
xx %>%
distinct() %>%
group_by(number) %>%
mutate(n = n()) %>%
mutate(duplicated = ifelse(n>1, TRUE, FALSE)) %>%
select(-n)
letter number duplicated
<chr> <dbl> <lgl>
1 a 1 TRUE
2 a 2 TRUE
3 a 3 TRUE
4 b 1 TRUE
5 b 2 TRUE
6 b 3 TRUE
7 c 4 FALSE
8 c 5 FALSE
9 c 6 FALSE
distinct
is for if there are some duplicated inside of group,
xx <- tibble (letter = c (rep ("a", 4), rep ("b", 3), rep ("c", 3)),
number = c (1,1, 2, 3, 1, 2, 3, 4, 4, 6))
letter number
<chr> <dbl>
1 a 1
2 a 1
3 a 2
4 a 3
5 b 1
6 b 2
7 b 3
8 c 4
9 c 4
10 c 6
xxx <- xx %>%
distinct() %>%
group_by(number) %>%
mutate(n = n()) %>%
mutate(duplicated = ifelse(n>1, TRUE, FALSE)) %>%
select(-n)
xx %>%
full_join(xxx, by = c("letter", "number"))
letter number duplicated
<chr> <dbl> <lgl>
1 a 1 TRUE
2 a 1 TRUE
3 a 2 TRUE
4 a 3 TRUE
5 b 1 TRUE
6 b 2 TRUE
7 b 3 TRUE
8 c 4 FALSE
9 c 4 FALSE
10 c 6 FALSE
xxx <- xx %>%
distinct() %>%
add_count(number) %>%
mutate(duplicated = n> 1) %>%
select(-n)
xxx
xx %>%
full_join(xxx, by = c("letter", "number"))
Upvotes: 2
Reputation: 18722
You can use cur_group()
to subset xx
and test for containment:
library(dplyr)
xx %>%
group_by(letter) %>%
mutate(duplicated = number %in% xx$number[!xx$letter %in% cur_group()]) %>%
ungroup()
cur_group()
represents the value of the current grouping variable (eg a
, b
, c
).cur_group()
to subset the vector xx$number
for those that are not part of the current group (!xx$letter %in% cur_group()
).number
is in that subset (number %in% ...
).Output
letter number duplicated
<chr> <dbl> <lgl>
1 a 1 TRUE
2 a 2 TRUE
3 a 3 TRUE
4 b 1 TRUE
5 b 2 TRUE
6 b 3 TRUE
7 c 4 FALSE
8 c 5 FALSE
9 c 6 FALSE
Upvotes: 2