mike
mike

Reputation: 49

Find and keep duplicated items in each column in R

Is there any way I can use some like tidyverse's add_count() %>% filter() or distinct() or alternatively janitor's get_dupes() to find and keep the duplicated items of each column. No need to compare items of different columns with each other, each column needs to be considered separately.

data1 <-tribble(
  ~colA, ~colB,
  "a",   1,
  "b",   1,
  "c",   2,
  "c",   3
) 

Expected Output would be

colA colB

c   1   

    

Upvotes: 3

Views: 580

Answers (3)

missuse
missuse

Reputation: 19716

You can try with map_dfc which will map over the columns and return a data frame by column binding the outputs

library(tidyverse)
data1  %>% 
  map_dfc(~.x[duplicated(.x)])

# A tibble: 1 x 2
  colA   colB
  <chr> <dbl>
1 c         1

However this will result in unwanted behavior when each column has a different amount of duplicates due to recycling (when applying an operation to two vectors that requires them to be the same length - like column bind, R automatically repeats the shorter one, until it is long enough to match the longer one).

data1 <-tribble(
  ~colA, ~colB,
  "a",   1,
  "b",   1,
  "c",   2,
  "c",   3,
  "d",   1,
) 

data1  %>% 
  map_dfc( ~.x[duplicated(.x)])

# A tibble: 2 x 2
  colA   colB
  <chr> <dbl>
1 c         1
2 c         1

here colA has been recycled to match the length of colB. In such a case you are better off returning a list with map

data1  %>% 
  map( ~.x[duplicated(.x)])
#output
$colA
[1] "c"

$colB
[1] 1 1

Upvotes: 3

ThomasIsCoding
ThomasIsCoding

Reputation: 101209

A base R option

> list2DF(Map(function(x) x[duplicated(x)], data1))
  colA colB
1    c    1

Upvotes: 0

Jonas
Jonas

Reputation: 1810

In baseR

dupicatedList <- lapply(data1, function(columnValues) {
  unique(columnValues[duplicated(columnValues)])
})

Upvotes: 0

Related Questions