Find and remove duplicated observations across two columns in R

Question

I have an example data set like this:


df1 <- data.frame(c1=c('a','b','c','d','e','f','g', 'h'),
         c2=c('l','m','a','g','e','q','a','d'))

and I just want a data frame that removed the duplicates between c1 and c2. I already know how to grab the unique elements from c1 and c2, but what do I do after that, to end up with something like the following:

data.frame(c1=c(b,c,f,h),c2=c(l,m,q,NA))

akrun · Accepted Answer

An option is to get the intersecting elements with Reduce, remove those elements from each column with %in% and !, and then pad NA at the end

v1 <- Reduce(intersect, df1)
lst1 <- lapply(df1, function(x) x[!x %in% v1])
data.frame(lapply(lst1, `length<-`, max(lengths(lst1))))
#  c1   c2
#1  b    l
#2  c    m
#3  f    q
#4  h

data

df1 <- data.frame(c1=c('a','b','c','d','e','f','g', 'h'),
         c2=c('l','m','a','g','e','q','a','d'))

Find and remove duplicated observations across two columns in R

Answers (2)

data

Related Questions