Reputation: 965
I want to find rows that contain the same values across two or three columns. Here is an example dataset:
replicate(3, {sample(1:3)})
[,1] [,2] [,3]
[1,] 3 3 2
[2,] 2 1 1
[3,] 1 2 3
For this dataset, the first and the second row have duplicated values (i.e., 3 and 1) and therefore I want to extract and dispose them and later just keep the rows with the non-duplicated values (i.e., the third row in this case).
How to achieve that? I have a larger dataset. I appreciate for any help!
Upvotes: 0
Views: 1105
Reputation: 269694
Using m in the Note at the end, apply anyDuplicated to each row and use that to subset the rows. anyDupolicated returns 0 if there are no duplicates and the index of the first duplicate otherwise. The exclamation mark (!) will coerce 0 to FALSE and other values as TRUE and then negate it.
m[!apply(m, 1, anyDuplicated),, drop = FALSE ]
## [,1] [,2] [,3]
## [1,] 1 2 3
or
subset(m, !apply(m, 1, anyDuplicated))
## [,1] [,2] [,3]
## [1,] 1 2 3
This is the same matrix as shown in the question but generated without using random numbers for reproducibility.
m <- matrix(c(3, 2, 1, 3, 1, 2, 2, 1, 3), 3)
Upvotes: 2
Reputation: 21918
Here is a tidyverse solution in case you are interested:
library(dplyr)
library(purrr)
[,1] [,2] [,3]
[1,] 1 3 2
[2,] 3 1 3
[3,] 2 2 1
df %>%
as_tibble() %>%
mutate(dup = pmap_dbl(list(V1, V2, V3), ~ n_distinct(c(...)))) %>%
filter(dup == 3) %>%
select(-dup)
# A tibble: 1 x 3
V1 V2 V3
<int> <int> <int>
1 1 3 2
Upvotes: 1
Reputation: 533
Here you go
dataf<- replicate(3, {sample(1:3)})
dup_rows<-apply(dataf,1,FUN=function(x) ifelse( max(table(x) )>1 ,TRUE, FALSE) )
data_non_dup<-dataf[!dup_rows,]
Upvotes: 1