S Andrew
S Andrew

Reputation: 7198

How to create unknown face dataset for face recognition python

I have a python face recognition where I am using open-face model and SVM to detect and recognize faces. The general steps I am following to recognize image is below:

  1. Detect face using face detection model: Reason for using open face model instead of HAAR cascase is that cascade is not able to detect side face
  2. Extracting face embedding: Extracting the 128 d face embedding using open face model
  3. Training: Using SVM I am training the face embedding with appropriate label like below:

    params = {"C": [0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0], "gamma": [1e-1, 1e-2, 1e-3, 1e-4, 1e-5]}

    model = GridSearchCV(SVC(kernel="rbf", gamma="auto", probability=True), params, cv=3, n_jobs=-1)

    model.fit(data["embeddings"], labels)

  4. Testing: Extracting the face embedding of the test image, and predicting the results like below:

model.predict_proba()

I have unknown random face dataset and known person face dataset. The problem here is that if I add around 30 known person image and if I have around 10 unknown person image, it is recognizing the known person fine but if any unknown person comes in, it is also recognizing that unknown person as known person with high confidence which in actual should be unknown.

If I add more random person in unknown data set lets say around 50 images and if I have 30 known person image. It is recognizing known person image fine but confidence is low and if any unknown person comes in, it is now recognized as unknown

It looks like for good face recognition results we need to have appox same number of known and unknown person image which is practically not possible as known person images can increase to 100 or more than that for each known person we add. I am very confused here and not sure what to do. Is there any other way of recognizing known/unknown persons. Please help. Thanks

Upvotes: 1

Views: 2590

Answers (2)

Andrey  Smorodov
Andrey Smorodov

Reputation: 10852

I don't think svm will work well here. It is binary classifier by native. It will try to compute the border between two 128D points sets (known and unknown classes), but these classes are not internally connected with any relations. Known may be similar to unknown more than to another known in embedding space. That will be a problem for generalization for SVM. SVM may be used on closed sets, but you have open set for unknown faces.

It is more practical to use non-parametric methods, and use Bayesian approach, computing likelihoods as function of distance for known data in embedding space. Like in your previous question.

Upvotes: 1

emilioho2020
emilioho2020

Reputation: 102

It is normal that confidence decreases as the number of possible persons (number of labels) increases, as there are more possibilities. I'm trying to understand what you meant: you have a label for each person and then an additional label for unknown? That is not the way to go, as unknown is treated as any other person embedding. You should use a cutoff probability, and everything that falls below that is considered unknown.

Remember that there is a trade-off between the size of your prediction (more persons, more possibilities) and accuracy

Upvotes: 1

Related Questions