Reputation: 851
I'm currently working on a vision system for a UAV I am building. The goal of the system is to find target objects, which are rather well defined (see below), in a video stream that will be a 2-D flyover view of the ground. So far I have tried training and using a Haar-like feature based cascade, a la Viola Jones, to do the detection. I am training it with 5000+ images of the targets at different angles (perspective shifts) and ranges (sizes in the frame), but only 1900 "background" images. This does not yield good results at all, as I cannot find a suitable number of stages to the cascade that balances few false positives with few false negatives.
I am looking for advice from anyone who has experience in this area, as to whether I should: 1) ditch the cascade, in favor of something more suitable to objects defined by their outline and color (which I've read the VJ cascade is not). 2) improve my training set for the cascade, either by adding positives, background frames, organizing/shooting them better, etc. 3) Some other approach I can't fathom currently.
A description of the targets:
My goal is use something very fast, such as the VJ cascade, to find possible objects and their associated bounding box, and then pass these on to finer processing routines to determine the properties (color of the object and AN, value of the AN, actual shape, and GPS location). Any advice you can give me towards completing this goal would be much appreciated. The source code I currently have is a little lengthy for post here, but is freely available should you like to see it for reference. Thanks in advance!
-JB
Upvotes: 1
Views: 3378
Reputation: 441
May be solved but I came across a paper recently with an open-source license for generic object detection using normalised gradient features : http://mmcheng.net/bing/comment-page-9/
The details of the algorithms performance against illumination, rotation and scale may require a little digging. I can't remember on the top of my head where the original paper is.
Upvotes: 1
Reputation: 809
I would recommend ditching Haar classification, since you know a lot about your objects. In these cases you should start by checking what features you can use:
1) overhead flight means, as you said, you can basically treat these as fixed shapes on a 2D plane. There will be scaling, rotations and some minor affine transformations, which depends a lot on how wide-angled your camera is. If it isn't particularly wide-angled, that part can probably be ignored. Also, you probably know your altitude, by which you can probably also make very good assumption on the target size (scaling).
2) You know the colors, which also makes it quite easy to find objects. If these are very defined as primary color, then you can just filter the image based on color and find those contours. If you want to do something a little more advanced (which to me doesn't seem necessary though...) you can do a backprojection, which in my experience is very effective and fast. Note, if you're creating the objects, it would be better to use Red Green and Blue instead of primary colors (red green and yellow). Then you can simply split the image into it's respective channels and use a very high threshold.
3) You know the geometric shapes. I've never done this myself, but as far as I know, the options are using moments or using Hough transforms (although openCV only has hough algorithms for lines and circles, so you'd have to write your own for other shapes...). You might already have sufficiently good results without this step though...
If you want more specific recommendations, it would be very useful to upload a couple sample images. :)
Upvotes: 2