Improving people detection with openCV

Question

I'm trying out a sample for people detection on openCV. After running it on an image (original image available here) this is my result:

I'm using the people detection sample that comes bundled with openCV (slightly modified to avoid Visual Studio errors). This is the code that gets executed:

    // opencv-sample.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"

#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/objdetect/objdetect.hpp"
#include "opencv2/highgui/highgui.hpp"

#include 
#include 
#include 

using namespace cv;
using namespace std;

// static void help()
// {
//     printf(
//             "
Demonstrate the use of the HoG descriptor using
"
//             "  HOGDescriptor::hog.setSVMDetector(HOGDescriptor::getDefaultPeopleDetector());
"
//             "Usage:
"
//             "./peopledetect ( | .txt)

");
// }

int main(int argc, char** argv)
{
    Mat img;
    FILE* f = 0;
    char _filename[1024];

    if (argc == 1)
    {
        printf("Usage: peopledetect ( | .txt)
");
        return 0;
    }
    img = imread(argv[1]);

    if (img.data)
    {
        strcpy_s(_filename, argv[1]);
    }
    else
    {
        fopen_s(&f, argv[1], "rt");
        if (!f)
        {
            fprintf(stderr, "ERROR: the specified file could not be loaded
");
            return -1;
        }
    }

    HOGDescriptor hog;
    hog.setSVMDetector(HOGDescriptor::getDefaultPeopleDetector());
    namedWindow("people detector", 1);

    for (;;)
    {
        char* filename = _filename;
        if (f)
        {
            if (!fgets(filename, (int)sizeof(_filename) - 2, f))
                break;
            //while(*filename && isspace(*filename))
            //  ++filename;
            if (filename[0] == '#')
                continue;
            int l = (int)strlen(filename);
            while (l > 0 && isspace(filename[l - 1]))
                --l;
            filename[l] = '\0';
            img = imread(filename);
        }
        printf("%s:
", filename);
        if (!img.data)
            continue;

        fflush(stdout);
        vector found, found_filtered;
        double t = (double)getTickCount();
        // run the detector with default parameters. to get a higher hit-rate
        // (and more false alarms, respectively), decrease the hitThreshold and
        // groupThreshold (set groupThreshold to 0 to turn off the grouping completely).
        hog.detectMultiScale(img, found, 0, Size(8, 8), Size(32, 32), 1.05, 2);
        t = (double)getTickCount() - t;
        printf("tdetection time = %gms
", t*1000. / cv::getTickFrequency());
        size_t i, j;
        for (i = 0; i < found.size(); i++)
        {
            Rect r = found[i];
            for (j = 0; j < found.size(); j++)
                if (j != i && (r & found[j]) == r)
                    break;
            if (j == found.size())
                found_filtered.push_back(r);
        }
        for (i = 0; i < found_filtered.size(); i++)
        {
            Rect r = found_filtered[i];
            // the HOG detector returns slightly larger rectangles than the real objects.
            // so we slightly shrink the rectangles to get a nicer output.
            r.x += cvRound(r.width*0.1);
            r.width = cvRound(r.width*0.8);
            r.y += cvRound(r.height*0.07);
            r.height = cvRound(r.height*0.8);
            rectangle(img, r.tl(), r.br(), cv::Scalar(0, 255, 0), 3);
        }
        imshow("people detector", img);
        imwrite("detected_ppl.jpg", img);
        int c = waitKey(0) & 255;
        if (c == 'q' || c == 'Q' || !f)
            break;
    }
    if (f)
        fclose(f);
    return 0;
}

I would like to improve this result where I can detect at least 9 out of the 11 people in this image. How can I improve this result? Do I need to train a separate SVM? Or is there a better library I can use? Or do I need to resort to Deep Learning?

foundry · Accepted Answer

This is an improvement i acheived after spending not-much-time with the sample code.

What I did
- tweak some of the parameters in detectMultiScale
- adjust the filter to eliminate largely-overlapping rectangles

I would say I get 9/11 hits, with one false positive and two false negatives.

Which is all very well, but this is a single static image. Tweaking params to work against a single sample will lead to overfitting: such that you get exactly the response you are after on that one sample, but poor generalisation.

I strongly suggest that you get to know the openCV algorithms inside-out before ditching them for 'better' libraries and 'deep learning' approaches. If you don't know the strengths and weaknesses of this algorithm you won't be in any position to compare with other approaches from other libraries.

update
This is the code I used to achieve the result. It is closely derived from the peopledetect.cpp openCV sample. You will need to make a few changes as I am using a custom image reading function which won't be relevant for you.

I have added a slider for the 'scaleFactor' parameter so you can easily see the effect of changing it. detectMultiscale runs the classifier window over the image in multiple passes at different sizes. The scaleFactor parameter, which affects the sizing steps for each pass, makes a huge difference to the output with small variations to the setting. However it is a little meaningless to tune these params on a single still image, you really need to let it loose on a representative test set from your target data in order to assess the suitability of this (or any other) algorithm.

Improving people detection with openCV

Answers (1)

Related Questions