Choosing train images for convolutional neural network

Question

The goal is to localise objects from images. I decided to modify and train an existing model. However, I can't decide wether I should train the model using masks or only with ROI's.

For example : For class 1 data, only the class 1 object will be appearable on the image, every other regions will be filled with 0's and for the 2'nd class I'll do the same thing and will leave only 2'nd class's object in the mask, and so on for 3'rd and 4'th class.

The second way, using the ROI's : I'll crop each class from the image without mask, only the region on interest.

Then, I hope to continue do similar thing this : https://github.com/jazzsaxmafia/Weakly_detector

Shall I choose the the first way or second ? Any comments like "Your plan won't work, try this" is also appreciated.

--Edit-- To be clear,

Original image : http://s31.postimg.org/btyn660bf/image.jpg

1'st approach using masks:

1'st class : http://s31.postimg.org/4s0pjywpn/class11.png
2'nd class : http://s31.postimg.org/3zy1krsij/class21.png
3'rd class : http://s31.postimg.org/itcp5j09n/class31.png
4'rd class : http://s31.postimg.org/yowxv31gb/class41.png

1'st approach using ROI's:

1'st class : http://s31.postimg.org/4x4gtn40r/class1.png
2'nd class : http://s31.postimg.org/8s7uw7n6j/class2.png
3'rd class : http://s31.postimg.org/mxdny0w7v/class3.png
4'rd class : http://s31.postimg.org/qfpnuex3v/class4.png

P.S : The locations of objects will be in very similar for the new examples, so maybe using the mask approach can be a bit more useful. For the ROI approach I need to normalise each object which have very different sizes. However normalising the whole image mask may keep the variance between the original one much more less.

Aenimated1 · Accepted Answer

CNNs are generally quite robust to varying backgrounds assuming they're trained on a large amount of high-quality data. So I would guess that the difference between using the mask and ROI approaches won't be very substantial. For what it's worth, you will need to normalize the size of the images you're feeding to the CNN, regardless of which approach you use.

I have implemented some gesture recognition software and encountered a similar question. I could just use the raw, unprocessed ROI, or I could use a pre-processed version that filtered out much of the background. I basically tried it both ways and compared the accuracy of the models. In my case, I was able to get slightly better results from the pre-processed images. On the other hand, the backgrounds in my images were much more complex and varied. Anyway, my recommendation would be to build a solid mechanism for testing the accuracy of your model and experiment to see what works best.

Honestly, the most important thing is collecting lots of good samples for each class. In my case, I kept seeing substantial improvements until I hit about 5000 images per class. Since collecting lots of data takes a long time, it's best to capture and store the raw, full size images, along with any meta-data involved in the actual collection of the data so that you can experiment with different approaches (masking vs. ROI, varying input image sizes, other pre-processing such as histogram normalization, etc.) without having to collect new data.

Choosing train images for convolutional neural network

Answers (1)

Related Questions