Curious G.
Curious G.

Reputation: 877

Select duplicates based on two colums in r

I have this file:

Animal   birth
a     2015-09-25
a         NA
b     2015-08-26
b     2015-08-26
e     2015-10-18  
e        NA
d     2015-06-15
d     2015-06-15

and I need the animals and births identical like this:

Animal   birth
b     2015-08-26
b     2015-08-26
d     2015-06-15
d     2015-06-15

I tried this code:

new.dt= dt[(duplicated(dt$Animal) | duplicated(dt$Animal, fromLast = TRUE)) & (duplicated(dt$birth) & !is.na(dt$birth) | duplicated(dt$birth, fromLast = TRUE) & !is.na(dt$birth)), ]

and I got this:

Animal   birth
    a     2015-09-25
    b     2015-08-26
    b     2015-08-26
    e     2015-10-18  
    d     2015-06-15
    d     2015-06-15

Upvotes: 1

Views: 36

Answers (2)

akrun
akrun

Reputation: 887951

We can group by 'Animal', 'birth' and filter the groups having more than 1 element

library(dplyr)
dt %>%
    na.omit %>% 
    group_by(Animal, birth) %>% 
    filter(n() >1)

Upvotes: 2

IceCreamToucan
IceCreamToucan

Reputation: 28705

Your approach works if you use duplicated with the full data frame. If you had other columns you want to ignore you can just use dt[, c('Animal', 'birth')] inside duplicated

dt[duplicated(dt) | duplicated(dt, fromLast = TRUE)]
#    Animal      birth
# 1:      b 2015-08-26
# 2:      b 2015-08-26
# 3:      d 2015-06-15
# 4:      d 2015-06-15

Upvotes: 2

Related Questions