zhimin.feng
zhimin.feng

Reputation: 161

compute cosine distance for Cartesian product of two dataframe

the data like this :

u_df = pd.Series({'a':[0,0.11,0.22],'b':[0.92,0.11,0.65],'c':[0.2,0.5,0.23]}).reset_index()
u_df.columns = ['key','value']
v_df = pd.Series({'g':[0.5,0.21,0.5],'f':[0.12,0.191,0.68],'e':[0.2,0.1,0.23]}).reset_index()
v_df.columns = ['key','value']

    key        value
0   a     [0, 0.11, 0.22]
1   b  [0.92, 0.11, 0.65]
2   c    [0.2, 0.5, 0.23]

    key         value
0   e     [0.2, 0.1, 0.23]
1   f  [0.12, 0.191, 0.68]
2   g     [0.5, 0.21, 0.5]

And I want to comput the cosine distance bewteen this two dataframe of Cartesian product .I comput the two list for the cosine distance by :

def dot(K, L):
        if len(K) != len(L):
                return 0
        return sum(i[0] * i[1] for i in zip(K, L))

def similarity(item_1, item_2):
        return dot(item_1, item_2) / np.sqrt(dot(item_1, item_1) * dot(item_2, item_2))

similarities = {item: similarity(target_features[item[0]], train_features[item[1]]) for item in itertools.product(target_features,train_features)}

but i want comput it from the dataframe directly,and I want the last outcome like:

    key1   key2      value
0   a       e      0.780720058
1   a       f      0.968164605
2   a       g      0.733602842
3   b       e      0.948870564
4   b       f      0.707152537
……

Upvotes: 1

Views: 430

Answers (1)

jezrael
jezrael

Reputation: 863281

You can use cross join by merge first and then get cosine distance by apply:

from scipy.spatial.distance import cosine

u_df['tmp'] = 1
v_df['tmp'] = 1
df = pd.merge(u_df, v_df, on='tmp', how='outer')
df['value'] = df.apply(lambda x: (1 - cosine(x["value_x"], x["value_y"])), axis=1)
df = df[['key_x','key_y','value']]
print (df)
  key_x key_y     value
0     a     e  0.780720
1     a     f  0.968165
2     a     g  0.733603
3     b     e  0.948871
4     b     f  0.707153
5     b     g  0.967946
6     c     e  0.760748
7     c     f  0.657643
8     c     g  0.740844

Upvotes: 1

Related Questions