Reputation: 23
I would like to create a matrix showing the number of row-wise differences for each pairwise comparison of columns. This is what I'm starting with:
Ind1 Ind2 Ind3
Att1 A A B
Att2 A C C
Att3 B B D
This is what I want to end up with:
Ind1 Ind2 Ind3
Ind1
Ind2 1
Ind3 3 2
How can I do this in Python or R?
Upvotes: 1
Views: 688
Reputation: 269501
1) sapply Perform a double sapply
over the indicated function. We can optionally use as.dist on this and similarly on the other alternatives shown later but won't repeat it for each one.
nc <- ncol(m)
res <- sapply(1:nc, function(i) sapply(1:nc, function(j) sum(m[, i] != m[, j])))
res
## [,1] [,2] [,3]
## [1,] 0 1 3
## [2,] 1 0 2
## [3,] 3 2 0
or
as.dist(res)
## 1 2
## 2 1
## 3 3 2
2) List Comprehension Using the eList package we could generate it like this:
library(eList)
nc <- ncol(m)
Mat(for(i in 1:nc) for(j in 1:nc) sum(m[, i] != m[, j]))
## [,1] [,2] [,3]
## [1,] 0 1 3
## [2,] 1 0 2
## [3,] 3 2 0
3) outer We can use outer
like this:
f <- function(i, j) sum(m[, i] != m[, j])
outer(1:nc, 1:nc, Vectorize(f))
## [,1] [,2] [,3]
## [1,] 0 1 3
## [2,] 1 0 2
## [3,] 3 2 0
m <- structure(c("A", "A", "B", "A", "C", "B", "B", "C", "D"), .Dim = c(3L,
3L), .Dimnames = list(c("Att1", "Att2", "Att3"), c("Ind1", "Ind2",
"Ind3")))
Upvotes: 0
Reputation: 101257
Try adist
like below
> adist(sapply(df, toString))
Ind1 Ind2 Ind3
Ind1 0 1 3
Ind2 1 0 2
Ind3 3 2 0
Upvotes: 4
Reputation: 79208
Another base R approach:
x <- combn(df, 2, function(x)sum(do.call("!=", x)))
attributes(x) <- list(Labels = names(df), Size = ncol(df), class = "dist")
x
Ind1 Ind2
Ind2 1
Ind3 3 2
If you want, you could do:
as.matrix(x)
Ind1 Ind2 Ind3
Ind1 0 1 3
Ind2 1 0 2
Ind3 3 2 0
Upvotes: 1
Reputation: 2128
Use:
arr = df.values.T
arr = np.sum(arr[:, None] != arr, axis = -1)
mask = np.triu(np.ones(arr.shape)) == 0
arr = np.where(mask, arr, np.nan)
>>> pd.DataFrame(data = arr, index = df.columns, columns = df.columns)
Ind1 Ind2 Ind3
Ind1 NaN NaN NaN
Ind2 1.0 NaN NaN
Ind3 3.0 2.0 NaN
Upvotes: 1
Reputation: 1101
You can try the following
df <- read.table(header = TRUE, text = " Ind1 Ind2 Ind3
Att1 A A B
Att2 A C C
Att3 B B D")
v <- apply(combn(1:ncol(df), 2), 2, function(k) sum(df[, k[1]] != df[, k[2]]))
M <- matrix(0, nrow = ncol(df), ncol = ncol(df))
M[lower.tri(M)] <- v
M
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 1 0 0
[3,] 3 2 0
Upvotes: 1