jsimpsno
jsimpsno

Reputation: 460

remove rows with duplicate values in any other adjacent column

How can i remove rows with any same value that is in another column of the same row? For example,

df<-structure(list(V1 = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L), V2 = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 
2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), V3 = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L)), row.names = c(NA, -27L
), class = "data.frame")

##Top three rows
   V1 V2 V3
1   1  1  1
2   2  1  1
3   3  1  1
4   1  2  1
5   2  2  1
6   3  2  1
7   1  3  1
8   2  3  1

In the following case (only showing 8 rows), I would remove every row accept rows 6 and 8 since they do not have any duplicate values in any column of the same row. I'm preferably looking for a data.table solution since I have a much larger dataframe.

Upvotes: 0

Views: 115

Answers (2)

akrun
akrun

Reputation: 886938

An option using pairwise combn on the columns to check if there are equal values

df[!Reduce(`|`, combn(df, 2, FUN = function(x)
     x[[1]] == x[[2]], simplify = FALSE))]
   V1 V2 V3
1:  3  2  1
2:  2  3  1
3:  3  1  2
4:  1  3  2
5:  2  1  3
6:  1  2  3

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388817

You may use anyDuplicated for each row.

library(data.table)

setDT(df)
df[apply(df, 1, anyDuplicated) == 0]

#   V1 V2 V3
#1:  3  2  1
#2:  2  3  1
#3:  3  1  2
#4:  1  3  2
#5:  2  1  3
#6:  1  2  3 

Upvotes: 2

Related Questions