What does the semi hard triplet loss function from tensorflow_addons actually do?

Question

In my humble opinion, the simple answer to this question, it implements the semi hard triplet loss function as described in the paper "FaceNet: A Unified Embedding for Face Recognition and Clustering" is not true. Contrary to the paper, it does not use all semi hard triplets in a batch, but only the hardest semi hard triplet, i.e. the semi hard triplet, where the negative is closest to the anchor (but still farther away than the positive, off course). The comments in the code call these negatives_outside. If no semi hard negative can be found for an anchor positive pair it takes the easiest negative, i.e. the negative which is farthest away from the anchor, to complete the triplet (negatives_inside). Does anybody know where they got this from or what's the rationale behind this or is my understanding of the paper wrong?

To make sure my understanding of the tensorflow_addons version of the semi hard triplet loss function is correct, I recoded it in plain python, which is much easier to understand than the tensorflow version using heavy tensor algebra:

import numpy as np

def _pairwiseDistances(embeddings, squared=False):

    D = np.zeros((embeddings.shape[0], embeddings.shape[0]), dtype=np.float32)
    for k in range(embeddings.shape[0]):
        for s in range(k+1, embeddings.shape[0]):
            d = embeddings[k,:] - embeddings[s,:]
            d = np.sum(d*d)
            D[k,s] = d
            D[s,k] = d

    if not squared:
        D = np.sqrt(D)

    return D

def semiHardTripletLoss(labels, embeddings, alpha=1., normalized=True, squared=True):
    N = embeddings.shape[0]
    distances = _pairwiseDistances(embeddings, squared) # calculate pairwise distance matrix

    L = 0.
    count = 0
    for a in range(N): # give every embedding in the batch the chance to be an anchor
        for p in range(N): #try all negatives for the anchor
            if a == p: 
                continue # positive cannot be the same as the anchor
            if labels[a] != labels[p]:
                continue # positive must have same label as anchor
            Min = 1.e10
            Max = 1.e-10
            n0 = -1
            for n in range(N): # find suiting negative
                if labels[a] == labels[n]:
                    continue

                if distances[a,n] > Max:
                    Max = distances[a,n] # this will give easiest negative if no semi hard negative found

                if distances[a,p] >= distances[a,n] or distances[a,n] >= distances[a,p] + alpha:
                    continue # make sure negative is semi hard

                if distances[a,n] < Min:
                    n0 = n
                    Min = distances[a,n] # find hardest semi hard negative

            if n0 == -1: # no semi hard negative found
                l = np.maximum(distances[a,p] - Max + alpha, 0)
                #print('a={:d}, p={:d}, n0={:d}, Max={:f}, l={:f}'.format(a,p,n0,Max,l))
            else: # n0 is hardest semi hard negative
                l = np.maximum(distances[a,p] - distances[a,n0] + alpha, 0)
                #print('a={:d}, p={:d}, n0={:d}, d[a,n0]={:f}, l={:f}'.format(a,p,n0,distances[a,n0],l))
            L += l
            count += 1
            

    if normalized and count > 0:
        L /= count
        #print('count = {:d}'.format(count))

    return L

I tested this code with random features against the original and printed the difference of the two:

import tensorflow as tf
import Loss
import semiHardTripletLossNumpy as tln # import the numpy version posted above here
import numpy as np
import tensorflow_addons as tfa

tf.config.set_visible_devices([], 'GPU') # not worth bothering the GPU

batchSize = 20
nFeatures = 11
nSubjects = 7

Embedding = tf.Variable(np.random.rand(batchSize, nFeatures), dtype=tf.float32)
Embedding = tf.math.l2_normalize(Embedding, axis=1)
Label = tf.constant(np.random.randint(low=0, high=nSubjects, size=batchSize), dtype=tf.float32)

result1 = tfa.losses.triplet_semihard_loss(Label.numpy(), Embedding.numpy(), distance_metric='squared-L2')
result2 = tln.semiHardTripletLoss(Label, Embedding)
print(result1.numpy(), '-', result2, '=', result1.numpy()-result2)

I ran this many times, with different values for batchSize, nFeatures and nSubjects and always got something like:

0.96045184 - 0.9604518755718514 = -3.421748129284197e-08

What does the semi hard triplet loss function from tensorflow_addons actually do?

Answers (0)

Related Questions