Reputation: 11
In my humble opinion, the simple answer to this question, it implements the semi hard triplet loss function as described in the paper "FaceNet: A Unified Embedding for Face Recognition and Clustering" is not true. Contrary to the paper, it does not use all semi hard triplets in a batch, but only the hardest semi hard triplet, i.e. the semi hard triplet, where the negative is closest to the anchor (but still farther away than the positive, off course). The comments in the code call these negatives_outside. If no semi hard negative can be found for an anchor positive pair it takes the easiest negative, i.e. the negative which is farthest away from the anchor, to complete the triplet (negatives_inside). Does anybody know where they got this from or what's the rationale behind this or is my understanding of the paper wrong?
To make sure my understanding of the tensorflow_addons version of the semi hard triplet loss function is correct, I recoded it in plain python, which is much easier to understand than the tensorflow version using heavy tensor algebra:
import numpy as np
def _pairwiseDistances(embeddings, squared=False):
D = np.zeros((embeddings.shape[0], embeddings.shape[0]), dtype=np.float32)
for k in range(embeddings.shape[0]):
for s in range(k+1, embeddings.shape[0]):
d = embeddings[k,:] - embeddings[s,:]
d = np.sum(d*d)
D[k,s] = d
D[s,k] = d
if not squared:
D = np.sqrt(D)
return D
def semiHardTripletLoss(labels, embeddings, alpha=1., normalized=True, squared=True):
N = embeddings.shape[0]
distances = _pairwiseDistances(embeddings, squared) # calculate pairwise distance matrix
L = 0.
count = 0
for a in range(N): # give every embedding in the batch the chance to be an anchor
for p in range(N): #try all negatives for the anchor
if a == p:
continue # positive cannot be the same as the anchor
if labels[a] != labels[p]:
continue # positive must have same label as anchor
Min = 1.e10
Max = 1.e-10
n0 = -1
for n in range(N): # find suiting negative
if labels[a] == labels[n]:
continue
if distances[a,n] > Max:
Max = distances[a,n] # this will give easiest negative if no semi hard negative found
if distances[a,p] >= distances[a,n] or distances[a,n] >= distances[a,p] + alpha:
continue # make sure negative is semi hard
if distances[a,n] < Min:
n0 = n
Min = distances[a,n] # find hardest semi hard negative
if n0 == -1: # no semi hard negative found
l = np.maximum(distances[a,p] - Max + alpha, 0)
#print('a={:d}, p={:d}, n0={:d}, Max={:f}, l={:f}'.format(a,p,n0,Max,l))
else: # n0 is hardest semi hard negative
l = np.maximum(distances[a,p] - distances[a,n0] + alpha, 0)
#print('a={:d}, p={:d}, n0={:d}, d[a,n0]={:f}, l={:f}'.format(a,p,n0,distances[a,n0],l))
L += l
count += 1
if normalized and count > 0:
L /= count
#print('count = {:d}'.format(count))
return L
I tested this code with random features against the original and printed the difference of the two:
import tensorflow as tf
import Loss
import semiHardTripletLossNumpy as tln # import the numpy version posted above here
import numpy as np
import tensorflow_addons as tfa
tf.config.set_visible_devices([], 'GPU') # not worth bothering the GPU
batchSize = 20
nFeatures = 11
nSubjects = 7
Embedding = tf.Variable(np.random.rand(batchSize, nFeatures), dtype=tf.float32)
Embedding = tf.math.l2_normalize(Embedding, axis=1)
Label = tf.constant(np.random.randint(low=0, high=nSubjects, size=batchSize), dtype=tf.float32)
result1 = tfa.losses.triplet_semihard_loss(Label.numpy(), Embedding.numpy(), distance_metric='squared-L2')
result2 = tln.semiHardTripletLoss(Label, Embedding)
print(result1.numpy(), '-', result2, '=', result1.numpy()-result2)
I ran this many times, with different values for batchSize, nFeatures and nSubjects and always got something like:
0.96045184 - 0.9604518755718514 = -3.421748129284197e-08
Upvotes: 1
Views: 335