Reputation: 2014
I have 2 pandas dataframe of shape:
df.shape (1,8)
df1.shape (14,8)
I'd like to calculate cosine_similarity of df with each row in df1. Here's some sample data:
Attempting to do something similar to like this, where row is values in each row:
def cosine_calc(row_arr):
val = cosine_similarity(df.iloc[0].values, row_arr)
return val
# Apply function
dfComp['Cosine_val'] = dfComp.apply(lambda x: cosine_calc(row), axis=1)
Upvotes: 1
Views: 470
Reputation: 16856
from sklearn.metrics.pairwise import cosine_similarity
cosine_similarity(df.values, df1.values)
Testcase: Cosine similarity of an matrix (array of vectors) with itself should be symmetric
assert np.all(
cosine_similarity(df1.values, df1.values) -
cosine_similarity(df1.values, df1.values).T == 0)
Upvotes: 3