Reputation: 117
Many existing Tensorflow and Keras CNN code examples use the same sizes for training images, often 299*299, 244*244, 256*256, and a couple more. I presume that this depends partly on compatibility with pre-trained models, as well as the architecture itself.
I'm still evaluating architectures, but will probably end up with Mask R-CNN (or possibly Faster R-CNN), using Resnet, Inception or Xception, and Tensorflow or Keras. Target images to be analyzed are in the range of 1024*1024, but can be broken into smaller partitions.
Given the available pre-trained models, are there training image sizes that would afford any advantages? I'd like to avoid having to resize afterward, as that would diminish image clarity in some cases.
Upvotes: 2
Views: 7015
Reputation: 397
According to Matterport's implementation which can be found here https://github.com/matterport/Mask_RCNN the input size for the images is 1024x1024. Also,in the paper they have mention that they use 1024 pixels as the input running cityscape (check appendix b, I believe).
Upvotes: 0
Reputation: 117
OK, I found a partial answer to this:
Girshick's Faster R-CNN apparently does internal scaling of input images such that their shorter dimension is 600 pixels, but the larger edge is clamped at 1000 pixels. It sounds like this was due to memory limitations of available GPU's.
Given that image scaling is going to impose a CPU hit, and also cause some aliasing of edges, it seems that there could be an advantage in preprocessing of images.
I have not found the equivalent info for Mask R-CNN's yet.
Upvotes: 2