Reputation: 109
There are many similar questions but I'd like to compare 2 columns and delete all the duplicates in both columns so that all that is left is the unique observations in each column. Note: Duplicates are not side-by-side. If possible, I would also like a list of the duplicates (not just TRUE/FALSE). Thanks!
C1 C2
1 a z
2 c d
3 f a
4 e c
would become
C1 C2
1 f z
2 e d
with duplicate list
duplicates: a, c
Upvotes: 2
Views: 1292
Reputation: 38500
Here is a base R method using duplicated
and lapply
.
temp <- unlist(df)
# get duplicated elements
myDupeVec <- unique(temp[duplicated(temp)])
# get list without duplicates
noDupesList <- lapply(df, function(i) i[!(i %in% myDupeVec)])
noDupesList
$C1
[1] "f" "e"
$C2
[1] "z" "d"
data
df <- read.table(header=T, text=" C1 C2
1 a z
2 c d
3 f a
4 e c ", as.is=TRUE)
Note that this returns a list. This is much more flexible structure, as there is generally a possibility that a level may be repeated more than once in a particular variable. If this is not the case, you can use do.call
and data.frame
to put the result into a rectangular structure.
do.call(data.frame, noDupesList)
C1 C2
1 f z
2 e d
Upvotes: 0
Reputation: 2797
Here is another answer
where_dupe <- which(apply(df, 2, duplicated), arr.ind = T)
Gives you the location of the duplicated elements within your original data frame.
col_unique <- setdiff(1:ncol(df), where_dupe)
Gives you which columns had no duplicates
You can find out the values by indexing.
df[,col_unique]
Upvotes: 1