Optimzation of nested for loop in python

Question

x_tsvd is matrix of length 4.6 million(row). svd_tfidf is matrix of length 1862(row).

Both matrix has same number of column(260).

And i wand to calcuate cosine similarity for each 4.6 M rows of x_tsvd for each 1862 svd_tfidf.

Is there any way i can optimize it so that it take less time.

from numpy.linalg import norm
best_match=[]

keys=np.array(df_5M['file'])
values=np.array(df['file'])

for i in range(len(x_tsvd)):
    array_=[]
    for j in range (len(svd_tfidf)):
        cosine_similarity_=np.dot(x_tsvd[i],svd_tfidf[j])/(norm(x_tsvd[i])*norm(svd_tfidf[j]))
        array_.append(cosine_similarity_)
    index=np.array(array_).argsort()
    best_match.append({keys[i]:values[index][::-1][0:5]})

Update:

from numpy.linalg import norm
best_match=[]
#b=copy.copy(svd_tfidf)
keys=np.array(df_5M['file'])
values=np.array(df['file'])
#b=copy.copy(svd_tfidf)
for i in range(len(x_tsvd)):
    a=x_tsvd[i]
    b=svd_tfidf
    a_dot_b=np.sum(np.multiply(a,b),axis=1)
    norm_a=norm(a)
    norm_b=norm(b,axis=1)
    cosine_similarity_=a_dot_b/(norm_a*norm_b)
    index=np.argsort(cosine_similarity_)
    best_match.append({keys[i]:values[index][::-1][0:6]})       
        
     ```

Optimzation of nested for loop in python

Answers (1)

Related Questions