Tensorflow object detection: cropping large input images into tiles

Question

I have a skewed image of 1100x250 pixels, and some small labels boxes of 30x30. My coco model isn't training well, probably because everything gets resized to 300x300.

Some people on the internet suggest cropping my training images to be closer to 300x300 (so making tiles of my photo), and of course create the relevant annotation files.

However, I don't find official information about this, nor scientific papers. Is this the way to go?

Do I show all the tiles to my model when training, even though I know there are no object in there? (no annotations) Or do I only show the tiles with bounding boxes?
Do I crop evenly at 300x300, which makes the last tile smaller (thus stretched out more than the others)? Or do I try to keep an aspect ratio similar for every tile?
Do I need more training samples per tile? Right now I have about 500 images labeled, but over the entire span of 1100 pixels. Do I need 500 instances per tile?
Someone suggested cropping 300x300 around each labeled object, but to my mind tensorflow will just learn "oh it's always in the middle" which is not what I want

Thanks for helping, I feel like this is a bit underexplained on the internet.

Tensorflow object detection: cropping large input images into tiles

Answers (1)

Related Questions