Reputation: 45
My data = data.lab
data.lab <- data.frame(Name=c("A","e","b","c","d"),
bp =c( 12,12,11,12,11),
sugar = c(19,21,23,19,23))
I want to have only duplicate names with the reference
desired output
lab.data <- data.frame(Name=c("A","b","c","d"),
bp =c( 12,11,12,11),
sugar = c(19,23,19,23),
pair=c(1,1,2,2))
dub.data <- duplicated(data.lab) | duplicated(data.lab, fromLast = TRUE)
out.1=data.lab[dub.data, ]
this gives the duplicate data but i need a column as what are the duplicate pairs
Upvotes: 4
Views: 92
Reputation: 40171
With dplyr
, you can do:
data.lab %>%
group_by(bp, sugar) %>%
filter(n() == 2) %>%
mutate(pair = seq_along(Name))
Name bp sugar pair
<fct> <dbl> <dbl> <int>
1 A 12 19 1
2 b 11 23 1
3 c 12 19 2
4 d 11 23 2
Or:
data.lab %>%
group_by(bp, sugar) %>%
filter(n() == 2) %>%
mutate(pair = row_number())
Or if there could be more than two pairs of duplicates:
data.lab %>%
group_by(bp, sugar) %>%
filter(n() > 1) %>%
mutate(pair = seq_along(Name))
Or:
data.lab %>%
group_by(bp, sugar) %>%
filter(n() > 1) %>%
mutate(pair = row_number())
Or to group by all variables except of "Name":
data.lab %>%
group_by_at(vars(-matches("(Name)"))) %>%
filter(n() > 1) %>%
mutate(pair = seq_along(Name))
Or:
data.lab %>%
group_by_at(vars(-matches("(Name)"))) %>%
filter(n() > 1) %>%
mutate(pair = row_number())
Upvotes: 2
Reputation: 389325
Continuing from your approach , we can use ave
in base R
dat1 <- data.lab[duplicated(data.lab[c("bp", "sugar")]) |
duplicated(data.lab[c("bp", "sugar")], fromLast = TRUE) , ]
dat1$pair <- with(dat1, ave(Name, bp, sugar, FUN = seq_along))
dat1
# Name bp sugar pair
#1 A 12 19 1
#2 b 11 23 1
#3 c 12 19 2
#4 d 11 23 2
Upvotes: 1