Tominator
Tominator

Reputation: 1224

Tensorflow object detection: cropping large input images into tiles

I have a skewed image of 1100x250 pixels, and some small labels boxes of 30x30. My coco model isn't training well, probably because everything gets resized to 300x300.

Some people on the internet suggest cropping my training images to be closer to 300x300 (so making tiles of my photo), and of course create the relevant annotation files.

However, I don't find official information about this, nor scientific papers. Is this the way to go?

Thanks for helping, I feel like this is a bit underexplained on the internet.

Upvotes: 1

Views: 1894

Answers (1)

Nam Vu
Nam Vu

Reputation: 1757

FYI here is a scholar paper to reference: http://openaccess.thecvf.com/content_CVPRW_2019/papers/UAVision/Unel_The_Power_of_Tiling_for_Small_Object_Detection_CVPRW_2019_paper.pdf

We are in the same boat, I'm also working with the SSD Mobilenet CNN with 300x300 input tensors. SSD is just not great at this, no matter what approach you take. I've not tried to do anything to tweak the model, but at the application level, I've tried tried a couple of approaches:

  • Tiling images into uniform tiles before feeding it into the model. For instance: a 900x900 image may be fed into the model like this:
             1
|---------------------------|
|---------------------------|
|---------------------------|
|---------------------------|
|---------------------------|
|---------------------------|
|---------------------------|
|---------------------------|
|---------------------------|
          900x900

and be split like this:

     1           2           3
|---------| |---------| |---------| 
|---------| |---------| |---------| 
|---------| |---------| |---------| 
  300x300

     4           5           6
|---------| |---------| |---------| 
|---------| |---------| |---------| 
|---------| |---------| |---------| 

    7            8           9
|---------| |---------| |---------| 
|---------| |---------| |---------| 
|---------| |---------| |---------| 

[Edit] I make the tile size configurable, it does not need to be exactly 300x300.

  • Selective tiling, but first finding some region of interest. This approach is similar to the top one, but you first looks for some type way to detect a regions that you know for sure will have objects and only focus on those regions. The approach I went with for finding those regions is to find object contours, this could easily be done by adding some erosion to the image and then grab bounding boxes around it, similar to this. Those bounding boxes can then be resized to fit tensor shapes and pass on the model for inference!

I got better results for both methods, but obviously it requires much more inference than a single pass. There are also the post processing question on how to deal with overlapping objects from 2 different tiles and put them all together. That is another problem though, you can look at algorithms like nonmax suppression for things like that!

Cheers!

Upvotes: 2

Related Questions