Maxl Gemeinderat
Maxl Gemeinderat

Reputation: 555

Find highest Cosine Similarity in R

I have computed the cosine similarity of tweets, which I have already put in my_matrix. Now I want to get the highest similarity scores.

cos = cosine(my_matrix)
cos

cos gives me a matrix array with all the values in it. The output looks like this:

           1         2         3         4         5         6         7         8
1  1.0000000  0.5568073  0.3901539  0.5621206  0.2816833  0.2160066  0.2605051  0.2115766
2  0.5568073  1.0000000  0.6526458  0.7140950  0.4307470  0.3033117  0.2941557  0.3437280
3  0.3901539  0.6526458  1.0000000  0.5650099  0.3252116  0.2494666  0.2453746  0.3903765
4  0.5621206  0.7140950  0.5650099  1.0000000  0.4033797  0.2911018  0.3459270  0.3239339
5  0.2816833  0.4307470  0.3252116  0.4033797  1.0000000  0.2501818  0.1925585  0.1905618
6  0.2160066  0.3033117  0.2494666  0.2911018  0.2501818  1.0000000  0.1378479  0.2054312
7  0.2605051  0.2941557  0.2453746  0.3459270  0.1925585  0.1378479  1.0000000  0.1320529
8  0.2115766  0.3437280  0.3903765  0.3239339  0.1905618  0.2054312  0.1320529  1.0000000
9  0.4836184  0.6940823  0.5820808  0.7131646  0.4122365  0.2808218  0.3132991  0.3311042
10 0.3097645  0.3486836  0.2695222  0.3268555  0.1954665  0.1239200  0.1436308  0.1333930

Now I want to iterate through this matrix and get the highest value out of this matrix, except of 1 (because row 1 and column 1 = 1, row 2 and column 2 = 2...).

The output I want to get in this example is 0.7140950 in row 4 and column 2, as it is the second largest value after 1. So far I have tried a double for-loop, to iterate over the rows and columns, but this doesn't work at all and i don't know how to go on.

biggest_value = 0 

for(row in 1:nrow(party_m)) {
  for(col in 1:ncol(party_m)) {
        if(my_matrix[row, col] > biggest_value ){
           biggest_value = my_matriy[row,col]
        }
  }
}

Does anybody have a solution for this?

Upvotes: 1

Views: 337

Answers (2)

gung - Reinstate Monica
gung - Reinstate Monica

Reputation: 11893

It's possible your code doesn't work because you have a typo biggest_value = my_matriy[row,col], instead of biggest_value = my_matrix[row,col], although I haven't run it to find out.

As noted in the comments, you can set the diagonal elements of the matrix to be 0, and then determine the maximum value in the matrix. You don't have any negative values, but in general, you may prefer to get the maximum absolute value instead / as well, if the strongest association is desired. To find which pair yields those values, use ?which. Consider:

diag(cos) <- 0 
max(cos)
# [1] 0.714095
which(cos==max(cos), arr.ind=TRUE) 
#      row col
# [1,]   4   2
# [2,]   2   4

Upvotes: 1

deschen
deschen

Reputation: 10996

diag(cos) <- 0

which(cos == max(cos), arr.ind = TRUE)

Note that since your matrix is symmetric, you'll get the several max values, e.g. row 4, column 2 and row2, column 4.

You can set the upper triangular to missing first to prevent this:

cos[upper.tri(cos, diag = TRUE)] <- NA

and then use the which function.

Upvotes: 2

Related Questions