Kira Tebbe
Kira Tebbe

Reputation: 586

R: remove duplicated values in across rows and columns

I've found many pages about finding duplicated elements in a list or duplicated rows in a data frame. However, I want to search for duplicated elements throughout the entire data frame. Take this as an example:

df
     coupon1    coupon2    coupon3
1         10         11         12
2         13         16         15
3         16         17         18
4         19         20         21
5         22         23         24
6         25         26         27

You'll notice that df[2,2] and df[3,1] have the same element (16). When I run

duplicated(df)

It returns six "FALSE"s because the entire row isn't duplicated, just one element. How can I check for any duplicated values within the entire data frame? I would like to both know the duplicate exist and also know its value (and the same if there's multiple duplicates).

Upvotes: 3

Views: 459

Answers (2)

Pierre L
Pierre L

Reputation: 28441

This will find global dupes but it searches columnwise. So (3,1) will still be FALSE as it is the first value 16 in the data frame.

m <- matrix(duplicated(unlist(df)), ncol=ncol(df))
#      [,1]  [,2]  [,3]
#[1,] FALSE FALSE FALSE
#[2,] FALSE  TRUE FALSE
#[3,] FALSE FALSE FALSE
#[4,] FALSE FALSE FALSE
#[5,] FALSE FALSE FALSE
#[6,] FALSE FALSE FALSE

You can then use it however you'd like, for example:

df[m]
#[1] 16

Upvotes: 2

user227710
user227710

Reputation: 3194

which(duplicated(stack(yourdf)[,1]))
[1] 8
stack(yourdf)[,1][which(duplicated(stack(yourdf)[,1]))]
[1] 16

Upvotes: 1

Related Questions