Reputation: 2140
I have a tensor like this:
tf_a1 = [[-0.65 0. 0. 0. 0.42 0. 0. 0.51 0. 0.34 0.]
[0. -0.51 0. 0. -0.52 0. 0. 0. 0.53 0.42 0.]
[0. 0.32 0. -0.50 0.34 0. 0. 0.39 0.32 0.52 0.]
[0. 0.23 0.37 0. 0. 0.37 0.37 0. 0.47 0.39 0.3 ]]
I want to apply cosine similarity
over each column of this tensor. So, I want to find the similarity of the first column versus rest of the columns. Again, second column against rest of the columns and so on.
I have done this using the for loop as such:
def cosine_score(x):
for i, arr in enumerate(x):
if i == 0 :
first = cosine_similarity(x[i,].reshape(1, -1), x)
else:
second = cosine_similarity(x[i,].reshape(1, -1), x)
final = tf.concat((first, second), axis=0)
first = final
return final
sim_topics = cosine_score(tf_a1)
Now, When I want to include this in my model, I can not use foor loop as it is. seems I have to use tf.map_fn
to go over it.
I also have done like this:
def cosine_score(x):
def cos_similarity(col):
for i, arr in enumerate(col):
if i == 0:
first = cosine_similarity(col[i, ].reshape(1, -1), col)
else:
second = cosine_similarity(col[i, ].reshape(1, -1), col)
final = tf.concat((first, second), axis=0)
first = final
return final
sim = tf.map_fn(cos_similarity, x, dtype=tf.float32)
return sim
But here I need to remove the for loop
. My problem is that if I remove for loop
and access each column seperately, how can I access the rest of the columns to compare and apply cosine similarity
.
Please let me know if its not clear.
Upvotes: 1
Views: 279
Reputation: 5555
Cosine similarity is nothing more than an L2 normalized dot product. So, in Tensorflow
this should do the trick for you:
# Normalize the columns of the tensor
normalized_tensor = tf.math.l2_normalize(tf_a1, axis=0)
# Get the dot product between the columns
scores = tf.matmul(normalized_tensor, normalized_tensor, transpose_a=True)
The tensor scores
contains the cosine similarity between the columns of tf_a1
. Moreover, below is a Numpy
equivalent implementation:
# Normalize the columns of the tensor
normalized_tensor = tf_a1 / np.linalg.norm(tf_a1, axis=0)
# Get the dot product between the columns
scores = np.dot(normalized_tensor.T, normalized_tensor)
Finally, if you want to keep only one of the triangles (for example the upper triangle), and set the main diagonal to 0
, you can do the following in Tensorflow
:
zero_diag = tf.linalg.set_diag(scores, tf.zeros(tf.shape(scores)[0]))
triangular = tf.matrix_band_part(zero_diag, 0, -1)
Upvotes: 1