Reputation: 49
Is there any way I can use some like tidyverse's add_count() %>% filter() or distinct() or alternatively janitor's get_dupes() to find and keep the duplicated items of each column. No need to compare items of different columns with each other, each column needs to be considered separately.
data1 <-tribble(
~colA, ~colB,
"a", 1,
"b", 1,
"c", 2,
"c", 3
)
Expected Output would be
colA colB
c 1
Upvotes: 3
Views: 580
Reputation: 19716
You can try with map_dfc
which will map over the columns and return a data frame by column binding the outputs
library(tidyverse)
data1 %>%
map_dfc(~.x[duplicated(.x)])
# A tibble: 1 x 2
colA colB
<chr> <dbl>
1 c 1
However this will result in unwanted behavior when each column has a different amount of duplicates due to recycling (when applying an operation to two vectors that requires them to be the same length - like column bind, R automatically repeats the shorter one, until it is long enough to match the longer one).
data1 <-tribble(
~colA, ~colB,
"a", 1,
"b", 1,
"c", 2,
"c", 3,
"d", 1,
)
data1 %>%
map_dfc( ~.x[duplicated(.x)])
# A tibble: 2 x 2
colA colB
<chr> <dbl>
1 c 1
2 c 1
here colA
has been recycled to match the length of colB
. In such a case you are better off returning a list with map
data1 %>%
map( ~.x[duplicated(.x)])
#output
$colA
[1] "c"
$colB
[1] 1 1
Upvotes: 3
Reputation: 101209
A base R option
> list2DF(Map(function(x) x[duplicated(x)], data1))
colA colB
1 c 1
Upvotes: 0
Reputation: 1810
In base
R
dupicatedList <- lapply(data1, function(columnValues) {
unique(columnValues[duplicated(columnValues)])
})
Upvotes: 0