Mara
Mara

Reputation: 31

What exactly is the output of the SURF algorithm and how can I use them for classification (SVM, etc.)?

I am working on a project that tracks humans from aerial videos. One of the algorithms that we will use is SURF. Now I understand that SURF uses interest points, but I'm quite confused with comes after that. How exactly can I use the interest points for classification? I want to identify which detected objects in the video are humans or objects, so of course I need a training set, but what will I use? I've read somewhere that BoW should be used, but are there any other ways of extracting these SURF features? If I read the original SURF paper by Herbert Bay correctly, how the features were extracted, what the output was, and how they were prepared for classification were not mentioned.

I'm really confused. Please help. Thank you!

Upvotes: 1

Views: 1779

Answers (3)

Herbert Bay
Herbert Bay

Reputation: 225

sorry, I just saw that now. SURF has two parts

  1. The interest points which are extracted using the determinant of the Hessian matrix
  2. A description vector describing the neighbourhood of the interest points.

For classifiers you are interested in 2. As for the output format of the original SURF implementation

(1 + length of descriptor) number of points x y a b c l des x y a b c l des ...

x, y = position of interest point a, b, c = [a b; b c] entries of second moment matrix. SURF only has circular regions, hence b = 0; a = c -> radius = 1 / a^2 l = sign of laplacian (-1 or 1). This value is very useful as it describes if the detected blob is dark on light background (-1) or light on dark background (+1) des = descriptor vector itself. See the paper for more.

Hope that helps.

Upvotes: 0

LovaBill
LovaBill

Reputation: 5139

Let's say you have an image and you divide the image into smaller rectangular areas (called patches). Each patch is a rectangular area (x,y,width,height). Let's say you want to describe the colors inside a patch. Thus, you calculate the histogram in it and the result is a concatenation of numbers (a vector) (eg: [5 11 2 4 5]). This output vector is a description vector (a descriptor). If you use all patches to extract descriptors, the method is called dense sampling. If you say that only some of the patches are important then you use keypoints to specify which are significant and which not.

Keypoints are only points of greater significance than other points in an image. A descriptor is a vector that encodes color/shape/texture information of a small area (patch).

Edit: The output of SURF is a cv::Mat where the first row has 64 values (L2 normalized). You can compare two L2 normalized vectors with the L2-norm (euclidean distance).

Edit2: A classifier is a different story. I suggest you study the tutorial http://docs.opencv.org/doc/tutorials/ml/introduction_to_svm/introduction_to_svm.html, while keeping in mind that every 2D-point for your case is a Descriptor of 64 values.

Upvotes: 1

nickGR
nickGR

Reputation: 108

I am also working on an object detection project…I'm new to all of this but this might be helpful http://cs229.stanford.edu/proj2011/SchmittMcCoy-ObjectClassificationAndLocalizationUsingSURFDescriptors.pdf

Upvotes: 0

Related Questions