Reputation: 21
thank you for viewing this post. I am a newbie for R language.
I want to find if one column(not specified one) is a duplicate of the other, and return a matrix with dimensions num.duplicates x 2 with each row giving both indices of any pair of duplicated variables. the matrix is organized so that first column is the lower number of the pair, and it is increasing ordered.
Let say I have a dataset
v1 v2 v3 v4 v5 v6
1 1 1 2 4 2 1
2 2 2 3 5 3 2
3 3 3 4 6 4 3
and I want this
[,1] [,2]
[1,] 1 2
[2,] 1 6
[3,] 2 6
[4,] 3 5
Please help, thank you!
Upvotes: 1
Views: 469
Reputation: 93938
Something like this I suppose:
out <- data.frame(t(combn(1:ncol(dd),2)))
out[combn(1:ncol(dd),2,FUN=function(x) all(dd[x[1]]==dd[x[2]])),]
# X1 X2
#1 1 2
#5 1 6
#9 2 6
#11 3 5
Upvotes: 1
Reputation: 25638
First, generate all possible combinatons with expand.grid
. Second, remove duplicates and sort in desired order. Third, use sapply
to find indexes of repeated columns:
kk <- expand.grid(1:ncol(df), 1:ncol(df))
nn <- kk[kk[, 1] > kk[, 2], 2:1]
nn[sapply(1:nrow(nn),
function(i) all(df[, nn[i, 1]] == df[, nn[i, 2]])), ]
Var2 Var1
2 1 2
6 1 6
12 2 6
17 3 5
The approach I propose is R-ish, but I suppose writing a simple double loop is justified for this case, especially if you recently started learning the language.
Upvotes: 0
Reputation: 206546
I feel like i'm missing something more simple, but this seems to work.
Here's the sample data.
dd <- data.frame(
v1 = 1:3, v2 = 1:3, v3 = 2:4,
v4 = 4:6, v5 = 2:4, v6 = 1:3
)
Now i'll assign each column to a group using ave()
to look for duplicates. Then I'll count the number of columns in group
groups <- ave(1:ncol(dd), as.list(as.data.frame(t(dd))), FUN=min, drop=T)
Now that I have the groups, i'll split the column indexes up by those groups, if there is more than one, i'll grab all pairwise combinations. That will create a wide matrix and I flip it to a tall-line as you desire with t()
morethanone <- function(x) length(x)>1
dups <- t(do.call(cbind,
lapply(Filter(morethanone, split(1:ncol(dd), groups)), combn, 2)
))
That returns
[,1] [,2]
[1,] 1 2
[2,] 1 6
[3,] 2 6
[4,] 3 5
as desired
Upvotes: 0