Reputation: 252
I am trying to compare n number of rows between each other.
I want to compare the first element from the first column with a first element from the second column. If it matches == 1, else 0. Then the second element from the first column compared with the second element from the second column, and so on.
That way I would like to compare column 1 with column 2 then column 3, column 4 ...column 8. Then I would compare column 2 with column 3 and column 4.....and so on.
The example of the input:
V1 V2 V3 V4 V5 V6 V7 V8
1 C D A D W R D D
2 A A S E A T A A
3 V T T V A T S S
4 A T S S C D R Y
5 C D R Y C D A D
6 C A D V T V A T
7 V T E T D V T V
8 A T A A E A T A
9 R V V W A S E A
10 W R D D V V W A
This example has 8 columns but the number of columns varies between files. Therefore, I would need a script to be more flexible.
I have tried the following:
result <- list()
for (i in 1:(nrow(df) - 1)) {
for (j in (i + 1):nrow(df)) {
result[[paste(row.names(df)[i], row.names(df)[j], sep = '_')]] <- as.integer(df[,i ] == df[, j])
}
}
as.data.frame(do.call(rbind, result))
view(result)
That works well for comparison of the first column with the other columns but I would like to simultaneously compare all the columns with all the columns. V1_V2 ... V1_V8 V2_V3 ...V2_8 V3_V4 ...V3_V8 and so on.
Any suggestions highly appreciated!
Upvotes: 0
Views: 187
Reputation: 388982
If I have understood you correctly you want to compare every column with every other column with no repetitions (No V2_V1
if V1_V2
already calculated) and not comparing the column with itself (No V1_V1
).
We can use combn
here :
do.call(cbind, combn(names(df), 2, function(x) {
setNames(data.frame(as.integer(df[[x[1]]] == df[[x[2]]])),
paste0(x, collapse = '_'))
}, simplify = FALSE))
# V1_V2 V1_V3 V1_V4 V1_V5 V1_V6 V1_V7 V1_V8 V2_V3 V2_V4 V2_V5 V2_V6 V2_V7 ....
#1 0 0 0 0 0 0 0 0 1 0 0 1 ....
#2 1 0 0 1 0 1 1 0 0 1 0 1 ....
#3 0 0 1 0 0 0 0 1 0 0 1 0 ....
#4 0 0 0 0 0 0 0 0 0 0 0 0 ....
#5 0 0 0 1 0 0 0 0 0 0 1 0 ....
#6 0 0 0 0 0 0 0 0 0 0 0 1 ....
#7 0 0 0 0 1 0 1 0 1 0 0 1 ....
#8 0 1 1 0 1 0 1 0 0 0 0 1 ....
#9 0 0 0 0 0 0 0 1 0 0 0 0 ....
#10 0 0 0 0 0 1 0 0 0 0 0 0 ....
Upvotes: 1