Reputation: 3430
I am building a prototype for a face recognition system and while writing the algorithm, I had a few questions.
Algorithm:
Collect pair of (A(i),P(i),N(i)) -set of the anchor, positive, negative images of employees working at XYX company.
Using gradient descent train the Triplet loss function to learn CNN parameters. Actually, here I am training a Siamese network(Idea of running two identical CNNs' on 2 different inputs[one time on A(i)-P(i) and next A(i)-N(i)] and then comparing them).
![]()
These learned parameters will ensure that the distance between the flattened n-dim encoding of the same images would be small and different image would be large.!
Now, create a database wherein you will store the encoding of each training image of XYX company's employees!
Simply make a forward pass through the trained CNN and store the corresponding encoding of each image in the database
At test time, you have the image of an XYX company's employee and image of an outsider!
You will pass both of the test images through the CNN and get the corresponding encodings!
Now, The question comes that how would you find the similarity between the test-picture-encoding and all the training-picture-encoding in the database?
First question, Would you do cosine similarity or I need to do something else? Can you add more clarity on it?
Second question, Also, in terms of efficiency, how would you handle a scenario wherein you have 100,000 employees training-picture-encoding in the database present and for every new person you need to look these 100,000 encodings and compute cosine similarity and give result in <2 secs? Any suggestion on this part?
This problem can be mitigated by using the 2nd approach wherein we are using a learned distance function "d(img1, img2)" over a pair of images of employees as stated above on in point 1 to 3.
- My question is in case of a new employee joining the organization, How this learned distance function would be able to generalize when it was not been used in the training set at all? Isn't a problem of changed data distribution of test and train set? Any suggestion in this regards
Could anyone help in understanding these conceptual glitches?
Upvotes: 2
Views: 1355
Reputation: 3430
After doing some literature survey on Face verification and recognition/detection research papers in computer vision. I think I get an answer to all of my questions, So I am trying to answer it here.
First question, Would you do cosine similarity? Can you add more clarity on it?
Find the minimum distance between the test & every saved train image enc by simply computing a Euclidean distance between them.
Not keep a threshold say 0.7 and is the minimum distance is < 0.7 return the name of the employee else "not in the database error!"
Second question, Also, in terms of efficiency, how would you handle a scenario wherein you have 100,000 employees training-picture-encoding in the database present and for every new person you need to look these 100,000 encodings and compute cosine similarity and give result in <2 secs?
Third question: - First of all, we are learning the network parameters of the deep CNN(Siamese n/w) by minimizing the triplet loss function!
- Now, there is an assumption that these model parameters together can represent any human face at least!, so you will go ahead and save the "new person" encoding in the database by making forward pass through your network and later, use answer 1 to compute whether the person belongs to organization or not(face recognition problem). Moreover, In the FaceNet paper it's mentioned that we keep a holdout set of around one million images, that has the same distribution as our training set, but disjoint identities.
Upvotes: 2