Reputation: 1224
I have a skewed image of 1100x250 pixels, and some small labels boxes of 30x30. My coco model isn't training well, probably because everything gets resized to 300x300.
Some people on the internet suggest cropping my training images to be closer to 300x300 (so making tiles of my photo), and of course create the relevant annotation files.
However, I don't find official information about this, nor scientific papers. Is this the way to go?
Thanks for helping, I feel like this is a bit underexplained on the internet.
Upvotes: 1
Views: 1894
Reputation: 1757
FYI here is a scholar paper to reference: http://openaccess.thecvf.com/content_CVPRW_2019/papers/UAVision/Unel_The_Power_of_Tiling_for_Small_Object_Detection_CVPRW_2019_paper.pdf
We are in the same boat, I'm also working with the SSD Mobilenet CNN with 300x300 input tensors. SSD is just not great at this, no matter what approach you take. I've not tried to do anything to tweak the model, but at the application level, I've tried tried a couple of approaches:
1
|---------------------------|
|---------------------------|
|---------------------------|
|---------------------------|
|---------------------------|
|---------------------------|
|---------------------------|
|---------------------------|
|---------------------------|
900x900
and be split like this:
1 2 3
|---------| |---------| |---------|
|---------| |---------| |---------|
|---------| |---------| |---------|
300x300
4 5 6
|---------| |---------| |---------|
|---------| |---------| |---------|
|---------| |---------| |---------|
7 8 9
|---------| |---------| |---------|
|---------| |---------| |---------|
|---------| |---------| |---------|
[Edit] I make the tile size configurable, it does not need to be exactly 300x300.
I got better results for both methods, but obviously it requires much more inference than a single pass. There are also the post processing question on how to deal with overlapping objects from 2 different tiles and put them all together. That is another problem though, you can look at algorithms like nonmax suppression for things like that!
Cheers!
Upvotes: 2