Reputation: 59

Only leave duplicated rows in a dataframe, with R

I have a dataframe that looks like this:

col1	col2	col3
tn1	a	b
tn1	a	c

And I want to leave only those rows that are duplicated for col1 & col2, keeping BOTH rows:

col1	col2	col3
tn1	a	b
tn1	a	c

I've been trying to do this by using unique() or distinct() or anti_join() but can't figure it out.

Upvotes: 1

Answers (3)

TarJae

Reputation: 78917

Update: To address @r2evans concerns (see comments):

df[duplicated(df[,c("col1","col2")]) | duplicated(df[,c("col1","col2")], fromLast=TRUE),]

OR: 

df[ave(rep(0, nrow(df)), df[,c("col1","col2")], FUN = length) > 1,]

Base R:

df[df$col1 %in% df$col1[duplicated(df$col1)],]

  col1 col2 col3
1  tn1    a    b
2  tn1    a    c

Upvotes: 1

Davis Vaughan

Reputation: 2950

With vctrs:

library(tibble)
library(vctrs)

df <- tribble(
  ~col1, ~col2, ~col3,
  "tn1",   "a",   "b",
  "tn1",   "a",   "c",
  "tn2",   "d",   "b",
  "tn3",   "a",   "b"
)

cols <- df[c("col1", "col2")]

dups <- vec_duplicate_detect(cols)
dups
#> [1]  TRUE  TRUE FALSE FALSE

df[dups,]
#> # A tibble: 2 × 3
#>   col1  col2  col3 
#>   <chr> <chr> <chr>
#> 1 tn1   a     b    
#> 2 tn1   a     c

^{Created on 2023-01-27 with reprex v2.0.2.9000}

Upvotes: 1

Pame

Reputation: 59

Found this and worked

df %>% group_by(col1) %>% filter((duplicated(col2) | duplicated(col2, fromLast = T)))

Upvotes: 0

Only leave duplicated rows in a dataframe, with R

Answers (3)

Related Questions