Reputation: 1598
I have image patches from DDSM Breast Mammography that are 150x150
in size. I would like to augment my dataset by randomly cropping these images 2x times to 120x120
size. So, If my dataset contains 6500
images, augmenting it with random crop should get me to 13000
images. Thing is, I do NOT want to lose potential information in the image and possibly change ground truth label.
What would be best way to do this? Should I crop them randomly from 150x150
to 120x120
and hope for the best or maybe pad them first and then perform the cropping? What is the standard way to approach this problem?
Upvotes: 0
Views: 1096
Reputation: 5310
If your ground truth contains the exact location of what you are trying to classify, use the ground truth to crop your images in an informed way. I.e. adjust the ground truth, if you are removing what you are trying to classify.
If you don't know the location of what you are classifying, you could
But how do you "find out, what regions your classifier reacts to"? Multiple ways are described in Visualizing and Understanding Convolutional Networks by Zeiler and Fergus:
Imagine your classifier classifies breast cancer or no breast cancer. Now simply take an image that contains positive information for breast cancer and occlude part of the image with some blank color (see gray square in image above, image by Zeiler et al.) and predict cancer or not. Now move the occluded square around. In the end you'll get rough predictions scores for all parts of your original image (see (d) in the image above), because when you covered up the important part that is responsible for a positive prediction, you (should) get a negative cancer prediction.
If you have someone who can actually recognize cancer in an image, this is also a good way to check for and guard against confounding factors.
BTW: You might want to crop on-the-fly and randomize how you crop even more to generate way more samples.
If the 150x150 is already the region of interest (ROI) you could try the following data augmentations:
Upvotes: 1