striatum
striatum

Reputation: 1598

Cosine Similarity over columns of two matrices (data.frames) in R

I have two matrices with a rather large number of columns; typically, 1000 x 40000. I need to get a cosine similarity between corresponding rows. Previously, I was using the apply(M, 2, FUN=function(v)return(cossim(m, V), where M was a matrix, but V was a vector. I cannot figure out how to straightforwardly turn vector V into a matrix and then get only required (corresponding) columns. Currently, I am using a for-loop but that is horribly inefficient. This is how my code looks like:

for (i in 1:nrow(m1)) {
    m1$CosSim[i] = cossim(as.numeric(m1[i,1:39998]),
        as.numeric(m2[i,1:39998]))
}

How can I make proper use of apply family of functions, please?

Upvotes: 0

Views: 1116

Answers (1)

Grada Gukovic
Grada Gukovic

Reputation: 1253

Avoid using for() loops and apply on matrices wherever possible. This slows everything down. The only exception to this rule that I know of is, if one of the dimesions of the matrix is much smaller than the other and your looping exactly on the smaller dimension.

The following code computes the cosine distance direstcly on the matrices. It returns a vector of length nrow(xMat) containing the cosine of the n-th row vectors as its n-th element. Of course nrow(xMat) = nrow(yMat) is assumed.

cosine_dist <- function(xMat, yMat){
     numerator <- rowSums(xMat * yMat)
     denominator <- sqrt(rowSums(xMat^2))*sqrt(rowSums(yMat^2))
     return(numerator / denominator)
} 

Upvotes: 2

Related Questions