R Subset duplicates in a row and return duplicates and index

Question

I have the dataframe show below:

> df
  c1 c2 c3 c4
1  1  0  1  2
2  2  1  4  3
3  3  3  5  4
4  4  3  6  5
5  5  4  7  7

I am trying to subset the data frame to return the duplicate elements on a row-by-row basis. That is to return the row numbers and the corresponding elements which have duplicates in that row. Something like this:

index  duplicates
1       1
3       3
5       7

I have tried using the line of code below:

dfc <- apply(df, 1, function(x) duplicated(x))
dfc <- t(dfc)
df[dfc]

[1] 3 1 7

I would like to have the corresponding row index of the duplicate elements return as well. Especially in such a case of more than two elements having duplicates in a row.

akrun · Accepted Answer

Based on the example (assuming that there will be only pair of duplicate per row). We extract the duplicate elements for each row with apply and MARGIN=1. The output will be a list as some rows doesn't have any duplicates and are of length 0. From the list output, create a data.frame by getting the 'index' of 'l1' that have length not equal to 0 and the 'duplicates' from unlisting the 'l1'.

l1 <- apply(df, 1, FUN  = function(x) x[duplicated(x)])
data.frame(index = which(lengths(l1)!=0), duplicates = unlist(l1))
#  index duplicates
#1     1          1
#3     3          3
#5     5          7

R Subset duplicates in a row and return duplicates and index

Answers (1)

Related Questions