Metamatics
Metamatics

Reputation: 23

Create a matrix of pairwise comparisons between columns

I would like to create a matrix showing the number of row-wise differences for each pairwise comparison of columns. This is what I'm starting with:

     Ind1 Ind2 Ind3
Att1    A    A    B
Att2    A    C    C
Att3    B    B    D

This is what I want to end up with:

      Ind1  Ind2  Ind3
Ind1            
Ind2    1       
Ind3    3     2 

How can I do this in Python or R?

Upvotes: 1

Views: 688

Answers (5)

G. Grothendieck
G. Grothendieck

Reputation: 269501

1) sapply Perform a double sapply over the indicated function. We can optionally use as.dist on this and similarly on the other alternatives shown later but won't repeat it for each one.

nc <- ncol(m)
res <- sapply(1:nc, function(i) sapply(1:nc, function(j) sum(m[, i] != m[, j])))

res
##      [,1] [,2] [,3]
## [1,]    0    1    3
## [2,]    1    0    2
## [3,]    3    2    0

or

as.dist(res)
##   1 2
## 2 1  
## 3 3 2

2) List Comprehension Using the eList package we could generate it like this:

library(eList)

nc <- ncol(m)
Mat(for(i in 1:nc) for(j in 1:nc) sum(m[, i] != m[, j]))
##      [,1] [,2] [,3]
## [1,]    0    1    3
## [2,]    1    0    2
## [3,]    3    2    0

3) outer We can use outer like this:

f <- function(i, j) sum(m[, i] != m[, j])
outer(1:nc, 1:nc, Vectorize(f))
##      [,1] [,2] [,3]
## [1,]    0    1    3
## [2,]    1    0    2
## [3,]    3    2    0

Note

m <- structure(c("A", "A", "B", "A", "C", "B", "B", "C", "D"), .Dim = c(3L, 
3L), .Dimnames = list(c("Att1", "Att2", "Att3"), c("Ind1", "Ind2", 
"Ind3")))

Upvotes: 0

ThomasIsCoding
ThomasIsCoding

Reputation: 101257

Try adist like below

> adist(sapply(df, toString))
     Ind1 Ind2 Ind3
Ind1    0    1    3
Ind2    1    0    2
Ind3    3    2    0

Upvotes: 4

Onyambu
Onyambu

Reputation: 79208

Another base R approach:

x <- combn(df, 2, function(x)sum(do.call("!=", x)))

attributes(x) <- list(Labels = names(df), Size = ncol(df), class = "dist")

x
     Ind1 Ind2
Ind2    1     
Ind3    3    2

If you want, you could do:

as.matrix(x)
     Ind1 Ind2 Ind3
Ind1    0    1    3
Ind2    1    0    2
Ind3    3    2    0

Upvotes: 1

Amit Vikram Singh
Amit Vikram Singh

Reputation: 2128

Use:

arr = df.values.T
arr = np.sum(arr[:, None] != arr, axis = -1)
mask = np.triu(np.ones(arr.shape)) == 0
arr = np.where(mask, arr, np.nan)

>>> pd.DataFrame(data = arr, index = df.columns, columns = df.columns)
      Ind1  Ind2  Ind3
Ind1   NaN   NaN   NaN
Ind2   1.0   NaN   NaN
Ind3   3.0   2.0   NaN

Upvotes: 1

Chriss Paul
Chriss Paul

Reputation: 1101

You can try the following

df <- read.table(header = TRUE, text = "     Ind1 Ind2 Ind3
Att1    A    A    B
Att2    A    C    C
Att3    B    B    D")

v <- apply(combn(1:ncol(df), 2), 2, function(k) sum(df[, k[1]] != df[, k[2]]))
M <- matrix(0, nrow = ncol(df), ncol = ncol(df))
M[lower.tri(M)] <- v
M

     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    1    0    0
[3,]    3    2    0

Upvotes: 1

Related Questions