Reputation: 2257
I am building a RCNN detection network using Tensorflow's object detection API.
My goal is to detect bounding boxes for animals in outdoor videos. Most frames do not have animals and are just of dynamic backgrounds.
Most tutorials focus on training custom labels, but make no mention of negative training samples. How do these class of detectors deal with images which do not contain objects of interest? Does it just output a low probability, or will it force to try to draw a bounding box within an image?
My current plan is to use traditional background subtraction in opencv to generate potential frames and pass them to a trained network. Should I also include a class of 'background' bounding boxes as 'negative data'?
The final option would be to use opencv for background subtraction, RCNN to generate bounding boxes, then a classification model of crops to identify animals versus background.
Upvotes: 13
Views: 7749
Reputation: 302
I have found success by scanning my ground truth, copying the box areas plus a margin, then pasting tilings of those box areas onto new background images (guaranteed to have no objects), and creating corresponding XML files with the box category assertions.
I collect non-objects as "uncategorised" boxes - usually from glitches in the output from my latest model. These are tiled (just like the "is-objects") but are not updated in the XML files.
I produce tilings at various scales to build each new training set.
A further explanation and sample python code is here: https://github.com/brentcroft/ground-truth-productions
Upvotes: 2
Reputation: 1558
In general it's not necessary to explicitly include "negative images". What happens in these detection models is that they use the parts of the image that don't belong to the annotated objects as negatives.
Upvotes: 6
Reputation: 77860
If you expect your model to differentiate between "found a figure" and "no figure", then you will almost certainly need to train it on negative examples. Label these as "no image". In the "no image" case, yes, use the entire image as the bounding box; don't suggest that the model recognize anything smaller.
In "no image" cases, you may get a smaller bounding box, but that doesn't matter: in inference, you'll simply ignore whatever box is returned for "no image".
Of course, the critical issue here is to try it out, and see how well it works for you.
Upvotes: 3