student24
student24

Reputation: 252

Comparing columns with each other in the same data frame in r

I am trying to compare n number of rows between each other. I want to compare the first element from the first column with a first element from the second column. If it matches == 1, else 0. Then the second element from the first column compared with the second element from the second column, and so on.
That way I would like to compare column 1 with column 2 then column 3, column 4 ...column 8. Then I would compare column 2 with column 3 and column 4.....and so on.

The example of the input:

   V1  V2 V3  V4 V5  V6 V7 V8
1  C   D  A   D  W   R  D   D
2  A   A  S   E  A   T  A   A
3  V   T  T   V  A   T  S   S
4  A   T  S   S  C   D  R   Y
5  C   D  R   Y  C   D  A   D
6  C   A  D   V  T   V  A   T 
7  V   T  E   T  D   V  T   V
8  A   T  A   A  E   A  T   A
9  R   V  V   W  A   S  E   A
10 W   R  D   D  V   V  W   A

This example has 8 columns but the number of columns varies between files. Therefore, I would need a script to be more flexible.

I have tried the following:

result <- list()
    for (i in 1:(nrow(df) - 1)) {
      for (j in (i + 1):nrow(df)) {
        result[[paste(row.names(df)[i], row.names(df)[j], sep = '_')]] <- as.integer(df[,i ] == df[, j])
      }
    }
    as.data.frame(do.call(rbind, result))
    
    view(result)

That works well for comparison of the first column with the other columns but I would like to simultaneously compare all the columns with all the columns. V1_V2 ... V1_V8 V2_V3 ...V2_8 V3_V4 ...V3_V8 and so on.

Any suggestions highly appreciated!

Upvotes: 0

Views: 187

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388982

If I have understood you correctly you want to compare every column with every other column with no repetitions (No V2_V1 if V1_V2 already calculated) and not comparing the column with itself (No V1_V1).

We can use combn here :

do.call(cbind, combn(names(df), 2, function(x) {
  setNames(data.frame(as.integer(df[[x[1]]] == df[[x[2]]])), 
           paste0(x, collapse = '_'))
}, simplify = FALSE))

#   V1_V2 V1_V3 V1_V4 V1_V5 V1_V6 V1_V7 V1_V8 V2_V3 V2_V4 V2_V5 V2_V6 V2_V7 ....
#1      0     0     0     0     0     0     0     0     1     0     0     1 ....
#2      1     0     0     1     0     1     1     0     0     1     0     1 ....
#3      0     0     1     0     0     0     0     1     0     0     1     0 ....
#4      0     0     0     0     0     0     0     0     0     0     0     0 ....
#5      0     0     0     1     0     0     0     0     0     0     1     0 ....
#6      0     0     0     0     0     0     0     0     0     0     0     1 ....
#7      0     0     0     0     1     0     1     0     1     0     0     1 ....
#8      0     1     1     0     1     0     1     0     0     0     0     1 ....
#9      0     0     0     0     0     0     0     1     0     0     0     0 ....
#10     0     0     0     0     0     1     0     0     0     0     0     0 ....

Upvotes: 1

Related Questions