zzxczxczaa
zzxczxczaa

Reputation: 185

Using cosine_similarity function on Python

import numpy as np
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

a = np.array([[3,4],[2,5],[1,2],[1,2],[4,5]])

ap = pd.DataFrame(a, index=['Sonata','Etudes','Waltzes','Nocturnes','Marches'],columns=['search_history','view_count'])
ap

enter image description here

b = np.array([[4,4],[3,5],[2,1],[4,7],[1,2]])
bp = pd.DataFrame(b, index=['Sonata','Etudes','Waltzes','Nocturnes','Marches'],columns=['comment + wishlist ',' signup'])
bp

enter image description here

then i cosine_similarity function ,

from sklearn.metrics.pairwise import cosine_similarity
pd.DataFrame(cosine_similarity(a, b),columns=['A','B'], index=['Sonata','Etudes','Waltzes','Nocturnes','Marches'])

this gives:

ValueError: Shape of passed values is (5, 5), indices imply (5, 2)

so if i change like this,

from sklearn.metrics.pairwise import cosine_similarity
pd.DataFrame(cosine_similarity(a, b),columns=['A','B','c','d','e'], index=['Sonata','Etudes','Waltzes','Nocturnes','Marches'])

enter image description here

This result cames out.

This is not the result I thought. Like dataFrames a and b, i want to show results in five rows and two columns, but we always get results in only five rows and five columns.

What should I do?

expected result was

           A            B   
Sonata     0.989949     0.994692    
Etudes      0.919145    0.987241    
Waltzes     0.948683    0.997054    
Nocturnes   0.948683    0.997054    
Marches    0.993884     0.990992    

like this

Upvotes: 1

Views: 1109

Answers (1)

Guy
Guy

Reputation: 50949

cosine_similarity() will compare every value in the array to all the values in the second array, which is 5 * 5 operations and results. You want just the first two columns, so you can slice the result DataFrame

df = pd.DataFrame(cosine_similarity(a, b), columns=['A', 'B', 'C', 'D', 'E'], index=['Sonata', 'Etudes', 'Waltzes', 'Nocturnes', 'Marches'])
print(df[['A', 'B']]) # by columns names
# or
print(df.iloc[:, 0:2]) # by columns indices

Output

                  A         B
Sonata     0.989949  0.994692
Etudes     0.919145  0.987241
Waltzes    0.948683  0.997054
Nocturnes  0.948683  0.997054
Marches    0.993884  0.990992

Upvotes: 1

Related Questions