Reputation: 357
I have the dataframe show below:
> df
c1 c2 c3 c4
1 1 0 1 2
2 2 1 4 3
3 3 3 5 4
4 4 3 6 5
5 5 4 7 7
I am trying to subset the data frame to return the duplicate elements on a row-by-row basis. That is to return the row numbers and the corresponding elements which have duplicates in that row. Something like this:
index duplicates
1 1
3 3
5 7
I have tried using the line of code below:
dfc <- apply(df, 1, function(x) duplicated(x))
dfc <- t(dfc)
df[dfc]
[1] 3 1 7
I would like to have the corresponding row index of the duplicate elements return as well. Especially in such a case of more than two elements having duplicates in a row.
Upvotes: 1
Views: 249
Reputation: 886938
Based on the example (assuming that there will be only pair of duplicate per row). We extract the duplicate elements for each row with apply
and MARGIN=1
. The output will be a list
as some rows doesn't have any duplicates and are of length 0. From the list
output, create a data.frame
by getting the 'index' of 'l1' that have length
not equal to 0 and the 'duplicates' from unlist
ing the 'l1'.
l1 <- apply(df, 1, FUN = function(x) x[duplicated(x)])
data.frame(index = which(lengths(l1)!=0), duplicates = unlist(l1))
# index duplicates
#1 1 1
#3 3 3
#5 5 7
Upvotes: 1