Reputation: 51
We have to make a custom dataset for object detection for CNN. So we're going to note objects for detection with bounding boxes. I referred to several guides for object detection labeling like PASCAL. However, we encountered an issue for labeling.
If we want to label people in dataset images, do we need to label all visible objects(=people) in a picture? If we skip some objects(=people) in a picture, does it effect on object detection? I added some examples for labeling. Image (1) is a case of labeling all visible people in an image. And in Image (2), we just labeled some people in entire image.
Is Image (2) influence bad effect on object detection? It it does, we're going to label all visible objects as possible in an image.
(Image 1) Labeling all visible objects in a picture
(Image 2) Labeling some visible objects in a picture
Upvotes: 2
Views: 2293
Reputation: 866
Object detection models usually consist of 2 basic building blocks:
The first block generates various region proposals. As its name suggest, the region proposal is a candidate region that might contain an object. The second block receives every region proposal and classify it.
If you neglected a true positive object within the image, then you force the object detection model to label this true positive object as background. This heavily affects the learning experience of the model. Think of it for a while. You ask the model to do different classifications for the same sort of object.
As a conclusion, you have to label each true positive object to the model.
Upvotes: 4
Reputation: 6386
Yes it is important, if you skip some persons the network will only partially learn how to detect and regress a person location. The network may be resilient to few labelling errors but not as many as in your second example image.
To train an accurate network you need to label every visible object instance and if you want your network to be resilient to object obfuscation you should label partially masked objects too.
You can easily verify this behaviour by training two networks: one with all labels and the other one with half of them.
Upvotes: 1