Reputation: 59
I have a dataframe that looks like this:
col1 | col2 | col3 |
---|---|---|
tn1 | a | b |
tn1 | a | c |
tn2 | d | b |
tn3 | a | b |
And I want to leave only those rows that are duplicated for col1 & col2, keeping BOTH rows:
col1 | col2 | col3 |
---|---|---|
tn1 | a | b |
tn1 | a | c |
I've been trying to do this by using unique() or distinct() or anti_join() but can't figure it out.
Upvotes: 1
Views: 50
Reputation: 78917
Update: To address @r2evans concerns (see comments):
df[duplicated(df[,c("col1","col2")]) | duplicated(df[,c("col1","col2")], fromLast=TRUE),]
OR:
df[ave(rep(0, nrow(df)), df[,c("col1","col2")], FUN = length) > 1,]
Base R:
df[df$col1 %in% df$col1[duplicated(df$col1)],]
col1 col2 col3
1 tn1 a b
2 tn1 a c
Upvotes: 1
Reputation: 2950
With vctrs:
library(tibble)
library(vctrs)
df <- tribble(
~col1, ~col2, ~col3,
"tn1", "a", "b",
"tn1", "a", "c",
"tn2", "d", "b",
"tn3", "a", "b"
)
cols <- df[c("col1", "col2")]
dups <- vec_duplicate_detect(cols)
dups
#> [1] TRUE TRUE FALSE FALSE
df[dups,]
#> # A tibble: 2 × 3
#> col1 col2 col3
#> <chr> <chr> <chr>
#> 1 tn1 a b
#> 2 tn1 a c
Created on 2023-01-27 with reprex v2.0.2.9000
Upvotes: 1
Reputation: 59
Found this and worked
df %>% group_by(col1) %>% filter((duplicated(col2) | duplicated(col2, fromLast = T)))
Upvotes: 0