Reputation: 6615
I'm not really sure how to word this correctly but I would like to count the number of times there are similarities among the columns.
Imagine I have the 3 NFL teams listed below. Zeroes are losses and ones are victories. The rows are the week of the NFL season. I want to create a matrix that would show the count of how many times each NFL team had the same game outcome as the other teams. I was thinking m %*% t(m) would give the count of each pair of teams having the same result but it does not appear like that is correct. The new matrix would be a 3x3 with dolphins-jets-bills going down the rows and across the columns. I would be ignoring the diagonals since they are meaningless
dolphins=c(1,0,1)
jets= c(0,1,0)
bills = c(1,1,1)
m=matrix(c(dolphins, jets,bills),3,3)
colnames(m)=c("dolphins","jets","bills")
m
solution = matrix(c(1,0,2,0,1,1,2,1,1),3,3)
solution
If theres some other way to solve this that would be great, but I'm pretty sure there is a way to do this with linear algebra operations, I'm just stuck
Upvotes: 2
Views: 117
Reputation: 13304
You're on the right track:
result <- t(m) %*% m
dolphins jets bills
dolphins 2 0 2
jets 0 1 1
bills 2 1 3
Alternatively,
result <- crossprod(m)
Edit I was reminded in the comment below that teams have the same outcome when they lose at the same week. This can be taken into account by
result <- crossprod(m) + crossprod(1-m)
If you want to have 1s on the main diagonal, just do:
diag(result) <- 1
dolphins jets bills
dolphins 1 0 2
jets 0 1 1
bills 2 1 1
Upvotes: 2
Reputation: 48211
Notice that when comparing two columns we have #{the same outcome} = nrow(m) - #{different outcomes}
. This suggests that we can use Manhattan distance between columns. This can be done in the following way:
nrow(m) - dist(t(m), method = "manhattan", diag = TRUE, upper = TRUE)
# dolphins jets bills
# dolphins 0 0 2
# jets 0 0 1
# bills 2 1 0
solution
# [,1] [,2] [,3]
# [1,] 1 0 2
# [2,] 0 1 1
# [3,] 2 1 1
Upvotes: 3