kms
kms

Reputation: 2014

Applying a (cosine) similarity measure - pandas dataframe

I have 2 pandas dataframe of shape:

df.shape (1,8) 

df1.shape (14,8) 

I'd like to calculate cosine_similarity of df with each row in df1. Here's some sample data:

data

Attempting to do something similar to like this, where row is values in each row:

def cosine_calc(row_arr):
    
    val = cosine_similarity(df.iloc[0].values, row_arr)
    
    return val

  # Apply function
  dfComp['Cosine_val'] = dfComp.apply(lambda x: cosine_calc(row), axis=1)

Upvotes: 1

Views: 470

Answers (1)

mujjiga
mujjiga

Reputation: 16856

from sklearn.metrics.pairwise import cosine_similarity
cosine_similarity(df.values, df1.values)

Testcase: Cosine similarity of an matrix (array of vectors) with itself should be symmetric

assert np.all(
    cosine_similarity(df1.values, df1.values) - 
    cosine_similarity(df1.values, df1.values).T == 0)

Upvotes: 3

Related Questions