runningbirds
runningbirds

Reputation: 6615

Count of matching values in matrix columns

I'm not really sure how to word this correctly but I would like to count the number of times there are similarities among the columns.

Imagine I have the 3 NFL teams listed below. Zeroes are losses and ones are victories. The rows are the week of the NFL season. I want to create a matrix that would show the count of how many times each NFL team had the same game outcome as the other teams. I was thinking m %*% t(m) would give the count of each pair of teams having the same result but it does not appear like that is correct. The new matrix would be a 3x3 with dolphins-jets-bills going down the rows and across the columns. I would be ignoring the diagonals since they are meaningless

  dolphins=c(1,0,1)
  jets= c(0,1,0)
   bills = c(1,1,1)
   m=matrix(c(dolphins, jets,bills),3,3)
   colnames(m)=c("dolphins","jets","bills")
   m
   solution = matrix(c(1,0,2,0,1,1,2,1,1),3,3)
   solution

If theres some other way to solve this that would be great, but I'm pretty sure there is a way to do this with linear algebra operations, I'm just stuck

Upvotes: 2

Views: 117

Answers (2)

Marat Talipov
Marat Talipov

Reputation: 13304

You're on the right track:

result <- t(m) %*% m
         dolphins jets bills
dolphins        2    0     2
jets            0    1     1
bills           2    1     3

Alternatively,

 result <- crossprod(m)

Edit I was reminded in the comment below that teams have the same outcome when they lose at the same week. This can be taken into account by

result <- crossprod(m) + crossprod(1-m)

If you want to have 1s on the main diagonal, just do:

diag(result) <- 1

         dolphins jets bills
dolphins        1    0     2
jets            0    1     1
bills           2    1     1

Upvotes: 2

Julius Vainora
Julius Vainora

Reputation: 48211

Notice that when comparing two columns we have #{the same outcome} = nrow(m) - #{different outcomes}. This suggests that we can use Manhattan distance between columns. This can be done in the following way:

nrow(m) - dist(t(m), method = "manhattan", diag = TRUE, upper = TRUE)
#          dolphins jets bills
# dolphins        0    0     2
# jets            0    0     1
# bills           2    1     0
solution
#      [,1] [,2] [,3]
# [1,]    1    0    2
# [2,]    0    1    1
# [3,]    2    1    1

Upvotes: 3

Related Questions