cliu
cliu

Reputation: 965

Find rows that contain the same values across two or three columns

I want to find rows that contain the same values across two or three columns. Here is an example dataset:

replicate(3, {sample(1:3)})
     [,1] [,2] [,3]
[1,]    3    3    2
[2,]    2    1    1
[3,]    1    2    3

For this dataset, the first and the second row have duplicated values (i.e., 3 and 1) and therefore I want to extract and dispose them and later just keep the rows with the non-duplicated values (i.e., the third row in this case).

How to achieve that? I have a larger dataset. I appreciate for any help!

Upvotes: 0

Views: 1105

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269694

Using m in the Note at the end, apply anyDuplicated to each row and use that to subset the rows. anyDupolicated returns 0 if there are no duplicates and the index of the first duplicate otherwise. The exclamation mark (!) will coerce 0 to FALSE and other values as TRUE and then negate it.

m[!apply(m, 1, anyDuplicated),, drop = FALSE ]
##      [,1] [,2] [,3]
## [1,]    1    2    3

or

subset(m, !apply(m, 1, anyDuplicated))
##      [,1] [,2] [,3]
## [1,]    1    2    3

Note

This is the same matrix as shown in the question but generated without using random numbers for reproducibility.

m <- matrix(c(3, 2, 1, 3, 1, 2, 2, 1, 3), 3)

Upvotes: 2

Anoushiravan R
Anoushiravan R

Reputation: 21918

Here is a tidyverse solution in case you are interested:

library(dplyr)
library(purrr)

     [,1] [,2] [,3]
[1,]    1    3    2
[2,]    3    1    3
[3,]    2    2    1


df %>%
  as_tibble() %>%
  mutate(dup = pmap_dbl(list(V1, V2, V3), ~ n_distinct(c(...)))) %>%
  filter(dup == 3) %>%
  select(-dup)


# A tibble: 1 x 3
     V1    V2    V3
  <int> <int> <int>
1     1     3     2

Upvotes: 1

Ashish Baid
Ashish Baid

Reputation: 533

Here you go

dataf<- replicate(3, {sample(1:3)})

dup_rows<-apply(dataf,1,FUN=function(x) ifelse( max(table(x) )>1 ,TRUE, FALSE) )

data_non_dup<-dataf[!dup_rows,]

Upvotes: 1

Related Questions