rjoniuqa
rjoniuqa

Reputation: 197

OpenCV - Using SVM and HOG for person detection

I'm aware of the steps needed to accomplish this task:

  1. Collect the training sets (positive and negative sets).
  2. Extract the hog descriptor for each image to be used for training the SVM (currently '1' class label for positive and '-1' class label for negative).
  3. Set the trained SVM to the HOGDescriptor and use detect/detectMultiscale.

I have done all of the steps above. I'm just confused, which class does the HOGDescriptor.detect/detectMultiscale detect? Does it detect only the positive class label (1)?

Upvotes: 3

Views: 6159

Answers (1)

Kornel
Kornel

Reputation: 5354

In computer vision, visual descriptors or image descriptors (i.e. HoG) are descriptions of the visual features of the contents in images. They describe elementary characteristics such as the shape, the color, the texture or the motion, among others. So HoG descriptors only characterize the scene - shown in the image, i.e. a pedestrian who is walking on the street, you can see an example HoG descriptor below (HoG just counts occurrences of gradient orientation in localized portions of an image):

enter image description here

SVMs are a set of supervised learning methods used for classification, regression and outliers detection. But originally, SVM was a technique for building an optimal binary (2-class) classifier, so SVMs make decision about what the descriptors mean. So what is to say, the output of HoG is the input of SVMs and the output of the latter is +1 or -1.

OpenCV provides an interface which hides this operation and the full object detection can be done by a function call. This is what HOGDescriptor::detectMultiScale() does, it performs object detection with a multi-scale window. Once a cv::HOGDescriptor hog instance would be declared, then the coefficients of an SVM classifier should be also done by:

hog.setSVMDetector(cv::HOGDescriptor::getDefaultPeopleDetector());

And then detectMultiScale() performs the full object detection (descriptor extraction and binary classification together) and returns the bounding boxes of each candidates:

std::vector<cv::Rect> found;
hog.detectMultiScale(frame, found, 0, cv::Size(8,8), cv::Size(32,32), 1.05, 2);

Upvotes: 3

Related Questions