Best CNN architectures for small images (80x80)?

Question

I'm new in computer vision area and I hope you can help me with some fundamental questions regarding CNN architectures.

I know some of the most well-known ones are: VGG Net ResNet Dense Net Inception Net Xception Net

They usually need an input of images around 224x224x3 and I also saw 32x32x3.

Regarding my specific problem, my goal is to train biomedical images with size (80x80) for a 4-class classification - at the end I'll have a dense layer of 4. Also my dataset is quite small (1000images) and I wanted to use transfer learning.

Could you please help me with the following questions? It seems to me that there is no single correct answer to them, but I need to understand what should be the correct way of thinking about them. I will appreciate if you can give me some pointers as well.

Should I scale my images? How about the opposite and shrink to 32x32 inputs?
Should I change the input of the CNNs to 80x80? What parameters should I change mainly? Any specific ratio for the kernel and the parameters?
Also I have another problem, the input requires 3 channels (RGB) but I'm working with grayscale images. Will it change the results a lot?
Instead of scaling should I just fill the surroundings (between the 80x80 and 224x224) as background? Should the images be centered in this case?
Do you have any recommendations regarding what architecture to choose?
I've seen some adaptations of these architectures to 3D/volumes inputs instead of 2D/images. I have a similar problem to the one I described here but with 3D inputs. Is there any common reasoning when choosing a 3D CNN architecture instead of a 2D?

In advances I leave my thanks!

yapws87 · Accepted Answer

I am assuming you basic know-how in using CNN for classification

Answering question 1~3

You scale your image for several purposes. Smaller the image, the faster the training and inference time. However you will lose important information in the process of shrinking the image. There is no one right answer and it all depends on your application. Is real-time process important? If your answer is no, always stick to the original size.

You will also need to resize your image to fit the input size of predefined models if you plan to retrain them. However, since your image is in grayscale, you will need to find models trained in gray or create a 3 channel image and copy the same value to all R,G and B channel. This is not efficient but it will help you reuse the high quality model trained by others.

The best way i see for you to handle this problem is to train everything from start. 1000 can seem to be a small number of data, but since your domain is specific and only require 4 classes, training from scratch doesnt seem that bad.

Question 4

When the size is different, always scale. filling with the surrounding will cause the model to learn the empty spaces and that is not what we want. Also make sure the input size and format during inference is the same as the input size and format during training.

Question 5

If processing time is not a problem RESNET. If processing time is important, then MobileNet.

Question 6

6) Depends on your input data. If you have 3D data then you can use it. More input data usually helps in better classification. But 2D will be enough to solve certain problem. If you can classify the images by looking at the 2D images, most probabily 2D images will be enough to complete the task.

I hope this will clear some of your problems and direct you to a proper solution.

Best CNN architectures for small images (80x80)?

Answers (1)

Related Questions