Reputation: 7198
I have a python
face recognition
where I am using open-face
model and SVM
to detect and recognize faces. The general steps I am following to recognize image is below:
Training: Using SVM
I am training the face embedding with appropriate label like below:
params = {"C": [0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0], "gamma": [1e-1, 1e-2, 1e-3, 1e-4, 1e-5]}
model = GridSearchCV(SVC(kernel="rbf", gamma="auto", probability=True), params, cv=3, n_jobs=-1)
model.fit(data["embeddings"], labels)
Testing: Extracting the face embedding of the test image, and predicting the results like below:
model.predict_proba()
I have unknown
random face dataset and known
person face dataset. The problem here is that if I add around 30 known person image and if I have around 10 unknown person image, it is recognizing the known person fine but if any unknown person comes in, it is also recognizing that unknown person as known person with high confidence which in actual should be unknown.
If I add more random person in unknown
data set lets say around 50 images and if I have 30 known person image. It is recognizing known person image fine but confidence is low and if any unknown person comes in, it is now recognized as unknown
It looks like for good face recognition results we need to have appox same number of known and unknown
person image which is practically not possible as known person images can increase to 100 or more than that for each known person we add. I am very confused here and not sure what to do. Is there any other way of recognizing known/unknown
persons. Please help. Thanks
Upvotes: 1
Views: 2590
Reputation: 10852
I don't think svm will work well here. It is binary classifier by native. It will try to compute the border between two 128D points sets (known and unknown classes), but these classes are not internally connected with any relations. Known may be similar to unknown more than to another known in embedding space. That will be a problem for generalization for SVM. SVM may be used on closed sets, but you have open set for unknown faces.
It is more practical to use non-parametric methods, and use Bayesian approach, computing likelihoods as function of distance for known data in embedding space. Like in your previous question.
Upvotes: 1
Reputation: 102
It is normal that confidence decreases as the number of possible persons (number of labels) increases, as there are more possibilities. I'm trying to understand what you meant: you have a label for each person and then an additional label for unknown? That is not the way to go, as unknown is treated as any other person embedding. You should use a cutoff probability, and everything that falls below that is considered unknown.
Remember that there is a trade-off between the size of your prediction (more persons, more possibilities) and accuracy
Upvotes: 1