Reputation: 35
I'm trying to using deep learning (3D CNN) to perform brain disease classification. Currently, the input size is set to be 96*96*96. This is due to the original scan have a size of 256*256*256. I first removed the background by resizing to 192*192*192 then downsampled by a factor of 2.
However, my dataset only contains 825 subjects. I want to augment the dataset to sufficient size but it troubled me a lot.
First of all, 96^3 result in 884k voxels for input. From my past experience, the number of training samples should be a lot more than the number of input units. So my first question is: Am I right about the training data should be more than input units (in this case, more than 884k).
Secondly, to perform data augmentation, what techniques are recommended? So far I tried rotation around 3 axes with 10-degree interval. But that only augments the data size by a factor of 100.
Thirdly, when training models, I used to append input data to a list and used sklearn's train-test-split function to split them. Another way is to use keras' ImageDataGenerator.flow_from_directory. However, now I'm dealing with 3D data, and no memory could afford loading thousand of 3d arrays altogether. And ImageDataGenerator does not support nifti file format. Is there any way I could prepare all my 3d arrays for my training without exhausting my memory? (I would imagine something like ImageDataGenerator. Of course, this is under my understanding that data generator sends data into the model one batch at a time. Correct me if I'm wrong)
Upvotes: 2
Views: 657
Reputation: 73
I face the exact same problem on the dataug for MRI so I can't help on that and I'd rather find an answer. However, on the Generator thing, I prepare my dataset BEFORE the training, in another script where I do all the transformation (resize, 3d array preparation, train/test split, ...) and I put everything in 3 big arrays (train, validation and test). I save this array using Numpy save function. You can also split each array in smaller arrays. If it's still not fitting in your RAM, you can create a sub-class of Keras Sequence.
Upvotes: 0