sudojarvis
sudojarvis

Reputation: 21

Optimzation of nested for loop in python

x_tsvd is matrix of length 4.6 million(row). svd_tfidf is matrix of length 1862(row).

Both matrix has same number of column(260).

And i wand to calcuate cosine similarity for each 4.6 M rows of x_tsvd for each 1862 svd_tfidf.

Is there any way i can optimize it so that it take less time.

from numpy.linalg import norm
best_match=[]

keys=np.array(df_5M['file'])
values=np.array(df['file'])

for i in range(len(x_tsvd)):
    array_=[]
    for j in range (len(svd_tfidf)):
        cosine_similarity_=np.dot(x_tsvd[i],svd_tfidf[j])/(norm(x_tsvd[i])*norm(svd_tfidf[j]))
        array_.append(cosine_similarity_)
    index=np.array(array_).argsort()
    best_match.append({keys[i]:values[index][::-1][0:5]})
    

Update:

from numpy.linalg import norm
best_match=[]
#b=copy.copy(svd_tfidf)
keys=np.array(df_5M['file'])
values=np.array(df['file'])
#b=copy.copy(svd_tfidf)
for i in range(len(x_tsvd)):
    a=x_tsvd[i]
    b=svd_tfidf
    a_dot_b=np.sum(np.multiply(a,b),axis=1)
    norm_a=norm(a)
    norm_b=norm(b,axis=1)
    cosine_similarity_=a_dot_b/(norm_a*norm_b)
    index=np.argsort(cosine_similarity_)
    best_match.append({keys[i]:values[index][::-1][0:6]})       
        
     ```

Upvotes: 0

Views: 159

Answers (1)

Jérôme Richard
Jérôme Richard

Reputation: 50806

There are several issues in your code. First of all, norm(x_tsvd[i]) is recomputed len(svd_tfidf)=1862 times while the expression can be move in the parent loop. Furthermore, norm(svd_tfidf[j]) is recomputed len(x_tsvd)=4.6e6 times while the expression can be precomputed for all j values only once. Moreover, calling np.dot(x_tsvd[i],svd_tfidf[j]) in two nested loops is not efficient. You can use a big matrix multiplication: x_tsvd @ svd_tfidf.T. However, since this matrix is huge (~64 GiB), it is reasonable to split x_tsvd in chunks of size 512~4096. Additionally, you can precompute the inverse of the norm because the multiplication by the inverse value is generally significantly faster than divisions. np.argsort(tmp_matrix[i])[::-1][0:5]] is not efficient and argpartition can be used instead to only compute the 5 best items (as I pointed out in a comment of the previous answer which advised you to use argsort). Note that a partition does not behave the same way than a sort if you care about equal items (ie. stable sort). There are no stable partitioning implementation available yet in Numpy.

In the end the optimized implementation should look like:

inv_norm_j = 1.0 / norm_by_line(svd_tfidf) # Horizontal vector

for chunk_start in range(0, len(x_tsvd), chunk_size):
    chunk_end = min(chunk_start + chunk_size, len(x_tsvd))
    inv_norm_i = 1.0 / norm_by_line(x_tsvd_block)[:,None] # Vertical vector
    x_tsvd_block = x_tsvd[chunk_start:chunk_end]
    tmp_matrix = (x_tsvd_block @ svd_tfidf.T) * inv_norm_i * inv_norm_j
    best_match_values = values[np.sort(np.argpartition(tmp_matrix, len(svd_tfidf)-5)[:,-5:])[:,::-1]]
    # Pure-Python part that can hardly be optimized
    for i in range(chunk_start, chunk_end):
        best_match.append({keys[i]: best_match_values[i]})

Where norm_by_line can be computed in a vectorized way (certainly with Scipy for example). Note that this is a untested draft and not a code that you should trust completely and copy-part blindly ;) .


Regarding the recent update (which is a code computing a different result), most optimizations are identical but there is a big improvement you can do on np.sum(np.multiply(a,b),axis=1). Indeed, you can use np.einsum('ij,ij->i', a, b) instead so not to compute the large expensive temporary matrix. It is 3 times faster on my machine.

Upvotes: 1

Related Questions