Reputation: 8889
Question
What's the fastest open-source HOG extraction code for multicore CPUs?
Motivation
I'm working on a real-time object detection application. Specifically, I've developed a variant of Deformable Parts Model cascades, targeting 30fps object detection. I've reached a point where extracting HOG features is more expensive than the rest of my pipeline, combined. I'm using the [Felzenzwalb, Girshick, et al] parameters for HOG extraction. That is, a multiresolution pyramid of HOG descriptors, and each descriptor has a total of 32 bins for orientation and a few other cues.
Goals
I'd like to do multiscale HOG feature extraction at 60fps (16ms) for 640x480 images on a multicore CPU.
Related Work
I've benchmarked a few off-the-shelf multiscale HOG implementations on a 6-core Intel 3930k CPU. For a 640x480 image, I observe the following performance numbers:
I've also experimented with the OpenCV HOG extraction code. The OpenCV version works, but it seems to be hard-coded for Dalal-Triggs' HOG setup, and OpenCV doesn't seem to allow me to use the same HOG parameters (normalization scheme, binary position features, etc) as [Felzenzwalb, Girshick, et al]. The OpenCV version also doesn't natively support multiscale HOG, though you could do the downsampling yourself and call OpenCV HOG for each scale. I don't remember what the OpenCV HOG performance looked like.
Final Thoughts
Upvotes: 17
Views: 13890
Reputation: 613
Have a look at the following implementation HoG SSE
It does fit your time requirements. It is written in C and uses 128 bit long SIMD instructions.
The code can be also further customized depending on normalization strategy and output type you need.
I would be glad to hear your feedback and be able to improve this code.
Upvotes: 1