Eldar Ron
Eldar Ron

Reputation: 71

reproducing dlib frontal_face_detector() training

I am trying to reproduce the training process of dlib's frontal_face_detector(). I am using the very same dataset (from http://dlib.net/files/data/dlib_face_detector_training_data.tar.gz) as dlib say they used, by union of frontal and profile faces + their reflections.

My problems are: 1. Very high memory usage for the whole dataset (30+Gb) 2. Training on partial dataset does not yield very high recall rate, 50-60 percent as compared to frontal_face_detector's 80-90 (testing on sub-set of images not used for training). 3. The detectors work badly on low resolution images and thus fail in detecting faces that are more than 1-1.5 meters deep. 4. Training run time increases significantly with SVM's C parameter that I have to increase to achieve better recall rate (I suspect that this is just overfitting artifact)

My original motivation in trainig was a. gaining the ability to adapt to the specific environment where the camera is installed by e.g. hard negative mining. b. improving detection in depth + run time by reducing the 80x80 window to 64x64 or even 48x48.

Am I on the right path? Do I miss anything? Please help...

Upvotes: 4

Views: 1392

Answers (1)

Davis King
Davis King

Reputation: 4791

The training parameters used were recorded in a comment in dlib's code here http://dlib.net/dlib/image_processing/frontal_face_detector.h.html. For reference:

        It is built out of 5 HOG filters. A front looking, left looking, right looking, 
    front looking but rotated left, and finally a front looking but rotated right one.

    Moreover, here is the training log and parameters used to generate the filters:
    The front detector:
        trained on mirrored set of labeled_faces_in_the_wild/frontal_faces.xml
        upsampled each image by 2:1
        used pyramid_down<6> 
        loss per missed target: 1
        epsilon: 0.05
        padding: 0
        detection window size: 80 80
        C: 700
        nuclear norm regularizer: 9
        cell_size: 8
        num filters: 78
        num images: 4748
        Train detector (precision,recall,AP): 0.999793 0.895517 0.895368 
        singular value threshold: 0.15

    The left detector:
        trained on labeled_faces_in_the_wild/left_faces.xml
        upsampled each image by 2:1
        used pyramid_down<6> 
        loss per missed target: 2
        epsilon: 0.05
        padding: 0
        detection window size: 80 80
        C: 250
        nuclear norm regularizer: 8
        cell_size: 8
        num filters: 63
        num images: 493
        Train detector (precision,recall,AP): 0.991803  0.86019 0.859486 
        singular value threshold: 0.15

    The right detector:
        trained left-right flip of labeled_faces_in_the_wild/left_faces.xml
        upsampled each image by 2:1
        used pyramid_down<6> 
        loss per missed target: 2
        epsilon: 0.05
        padding: 0
        detection window size: 80 80
        C: 250
        nuclear norm regularizer: 8
        cell_size: 8
        num filters: 66
        num images: 493
        Train detector (precision,recall,AP): 0.991781  0.85782 0.857341 
        singular value threshold: 0.19

    The front-rotate-left detector:
        trained on mirrored set of labeled_faces_in_the_wild/frontal_faces.xml
        upsampled each image by 2:1
        used pyramid_down<6> 
        rotated left 27 degrees
        loss per missed target: 1
        epsilon: 0.05
        padding: 0
        detection window size: 80 80
        C: 700
        nuclear norm regularizer: 9
        cell_size: 8
        num images: 4748
        singular value threshold: 0.12

    The front-rotate-right detector:
        trained on mirrored set of labeled_faces_in_the_wild/frontal_faces.xml
        upsampled each image by 2:1
        used pyramid_down<6> 
        rotated right 27 degrees
        loss per missed target: 1
        epsilon: 0.05
        padding: 0
        detection window size: 80 80
        C: 700
        nuclear norm regularizer: 9
        cell_size: 8
        num filters: 89
        num images: 4748
        Train detector (precision,recall,AP):        1 0.897369 0.897369 
        singular value threshold: 0.15

What the parameters are and how to set them is all explained in the dlib documentation. There is also a paper that describes the training algorithm: Max-Margin Object Detection.

Yes, it can take a lot of RAM to run the trainer.

Upvotes: 4

Related Questions