Hello Lili
Hello Lili

Reputation: 1587

FaceNet for dummies

The FaceNet algorithm (described in this article) uses a convolutional neural network to represent an image in an 128 dimensional Euclidean space.

While reading the article I didn't understand:

  1. How does the loss function impact on the convolutional network (in normal networks, in order to minimize the loss the weights are slightly changed - backpropagation - so, what happens in this case?)

enter image description here

  1. how are the triplets chosen?

    2.1 . how do I know a negative image is hard

    2.2 . why am I using the loss function to determine the negative image

    2.3 . when do I check my images for hardness with respect to the anchor - I believe that is before I send a triplet to be processed by the network, right.

enter image description here

Upvotes: 4

Views: 728

Answers (1)

Vijay Mariappan
Vijay Mariappan

Reputation: 17191

Here are some of the answer that may clarify your doubts:

  1. Even here the weights are adjusted to minimise the Loss, its just the loss term is little complicated. The loss has two parts(separated by + in the equation), first part is the image of a person compared to a different image of the same person. The second part is the image of the person compared to a image of a different person. We want the first part loss to be less than the second part loss and the loss equation in essence captures that. So here you basically want to adjust the weights such that same person error is less and different person error is more.

  2. The Loss term involves three images: The image in question(anchor): x_a, its positive pair: x_p and its negative pair: x_n. An hardest positive of x_a is the positive image that has the biggest error compared to the rest of the positive images. The hardest negative of x_a is the closest image of a different person. So you want to bring the furthest positives to be close to each other and push the closest negatives further away. This is captured in the loss equation.

  3. Facenet calculates its anchor during training (online). In each minibatch(which is a set of 40 images) they select the hardest negative to the anchor and instead of choosing the hardest positive image, they choose all anchor-positive pairs within the batch.

If you are looking to implement face recognition, you should better consider this paper, that implements centre loss, which is much easier to train and shown to perform better.

Upvotes: 3

Related Questions