Reputation: 586
I've found many pages about finding duplicated elements in a list or duplicated rows in a data frame. However, I want to search for duplicated elements throughout the entire data frame. Take this as an example:
df
coupon1 coupon2 coupon3
1 10 11 12
2 13 16 15
3 16 17 18
4 19 20 21
5 22 23 24
6 25 26 27
You'll notice that df[2,2] and df[3,1] have the same element (16). When I run
duplicated(df)
It returns six "FALSE"s because the entire row isn't duplicated, just one element. How can I check for any duplicated values within the entire data frame? I would like to both know the duplicate exist and also know its value (and the same if there's multiple duplicates).
Upvotes: 3
Views: 459
Reputation: 28441
This will find global dupes but it searches columnwise. So (3,1) will still be FALSE as it is the first value 16
in the data frame.
m <- matrix(duplicated(unlist(df)), ncol=ncol(df))
# [,1] [,2] [,3]
#[1,] FALSE FALSE FALSE
#[2,] FALSE TRUE FALSE
#[3,] FALSE FALSE FALSE
#[4,] FALSE FALSE FALSE
#[5,] FALSE FALSE FALSE
#[6,] FALSE FALSE FALSE
You can then use it however you'd like, for example:
df[m]
#[1] 16
Upvotes: 2
Reputation: 3194
which(duplicated(stack(yourdf)[,1]))
[1] 8
stack(yourdf)[,1][which(duplicated(stack(yourdf)[,1]))]
[1] 16
Upvotes: 1