R Compare non side-by-side duplicates in 2 columns

Question

There are many similar questions but I'd like to compare 2 columns and delete all the duplicates in both columns so that all that is left is the unique observations in each column. Note: Duplicates are not side-by-side. If possible, I would also like a list of the duplicates (not just TRUE/FALSE). Thanks!

would become

        C1 C2
     1  f  z
     2  e  d

with duplicate list

    duplicates: a, c

lmo · Accepted Answer

Here is a base R method using duplicated and lapply.

temp <- unlist(df)
# get duplicated elements
myDupeVec <- unique(temp[duplicated(temp)])

# get list without duplicates
noDupesList <- lapply(df, function(i) i[!(i %in% myDupeVec)])

noDupesList
$C1
[1] "f" "e"

$C2
[1] "z" "d"

data

df <- read.table(header=T, text="   C1 C2
     1  a  z 
     2  c  d
     3  f  a 
     4  e  c ", as.is=TRUE)

Note that this returns a list. This is much more flexible structure, as there is generally a possibility that a level may be repeated more than once in a particular variable. If this is not the case, you can use do.call and data.frame to put the result into a rectangular structure.

do.call(data.frame, noDupesList)
  C1 C2
1  f  z
2  e  d

R Compare non side-by-side duplicates in 2 columns

Answers (2)

Related Questions