Reputation: 827
The TensorFlow Object Detection API requires TFRecord image cropping properties, like so:
{
'image/height': 1800,
'image/width': 2400,
'image/filename': 'image1.jpg',
'image/source_id': 'image1.jpg',
'image/encoded': ACTUAL_ENCODED_IMAGE_DATA_AS_BYTES,
'image/format': 'jpeg',
'image/object/bbox/xmin': [0.7255949630314233, 0.8845598428835489],
'image/object/bbox/xmax': [0.9695875693160814, 1.0000000000000000],
'image/object/bbox/ymin': [0.5820120073891626, 0.1829972290640394],
'image/object/bbox/ymax': [1.0000000000000000, 0.9662484605911330],
'image/object/class/text': (['Cat', 'Dog']),
'image/object/class/label': ([1, 2])
}
But I have a data-set of pre-cropped images (as in each image only shows the object to be classified). Would there be any downside in the training process of providing pre-cropped image data with xmin ymin of 0 and xmax ymax of the cropped image size? My main concern is of whether or not the training system might otherwise use contextual data nearby the cropped selections.
My question would probably be better phrased as "Do TensorFlow models potentially use contextual details nearby locations selected in TFRecord files for training?"
Upvotes: 0
Views: 149
Reputation: 19143
No, you definitely need the bounding box information to train an object detector. Do they potentially use contextual information? Maybe, but it's a learned behaviour.
You need to provide images where your objects are visible in multiple backgrounds, scales, illuminations, etc in order to train a network to robustly detect an object. If you only have pre-cropped images all you can train is an image classifier, the detection part won't learn anything useful from those images.
Upvotes: 1