flightless13wings
flightless13wings

Reputation: 109

R Compare non side-by-side duplicates in 2 columns

There are many similar questions but I'd like to compare 2 columns and delete all the duplicates in both columns so that all that is left is the unique observations in each column. Note: Duplicates are not side-by-side. If possible, I would also like a list of the duplicates (not just TRUE/FALSE). Thanks!

        C1 C2
     1  a  z 
     2  c  d
     3  f  a 
     4  e  c 

would become

        C1 C2
     1  f  z
     2  e  d

with duplicate list

    duplicates: a, c 

Upvotes: 2

Views: 1292

Answers (2)

lmo
lmo

Reputation: 38500

Here is a base R method using duplicated and lapply.

temp <- unlist(df)
# get duplicated elements
myDupeVec <- unique(temp[duplicated(temp)])

# get list without duplicates
noDupesList <- lapply(df, function(i) i[!(i %in% myDupeVec)])

noDupesList
$C1
[1] "f" "e"

$C2
[1] "z" "d"

data

df <- read.table(header=T, text="   C1 C2
     1  a  z 
     2  c  d
     3  f  a 
     4  e  c ", as.is=TRUE)

Note that this returns a list. This is much more flexible structure, as there is generally a possibility that a level may be repeated more than once in a particular variable. If this is not the case, you can use do.call and data.frame to put the result into a rectangular structure.

do.call(data.frame, noDupesList)
  C1 C2
1  f  z
2  e  d

Upvotes: 0

shayaa
shayaa

Reputation: 2797

Here is another answer

 where_dupe <- which(apply(df, 2, duplicated), arr.ind = T)

Gives you the location of the duplicated elements within your original data frame.

col_unique <- setdiff(1:ncol(df), where_dupe)

Gives you which columns had no duplicates

You can find out the values by indexing.

df[,col_unique]

Upvotes: 1

Related Questions