Reputation: 652
For image analysis:
Is it better to increase the epochs, let's say from 2 to 4, for 40000 images. This takes double the time.
OR
Is it better to increase the size of the training data, from 40000 to 80000 but with 2 epochs only. This will also take double the time.
Since increasing both the number of epochs and the training data will take a lot of time, I can only do one.
What should be the choice?
Thank you.
Upvotes: 9
Views: 3592
Reputation: 524
Having more data is always a good approach but also having more epoch will lead to overfitting while less epochs will lead to underfitting. You can choose to have EarlyStopping in Keras which will stop training model at certain epoch once model performance stops improving.
Also if the data is limited you can augment data which will boost the number of different images you will use to train your model.
Please refer to openCV and scikit-image for different image transformations techniques like:
Upvotes: 3
Reputation: 19123
Caveats aside (bad/confusing samples, ...), increasing data is always preferred. The reason for this is generalization: you can show the same image N times to the network, or N different images. In the first case, it will overfit to the training dataset and fail to generalize to new images.
That's also the reason why data augmentation techniques exist: if you don't have any "new" data to train on, you can attemp to generate "new" samples applying transformations to the ones you have.
Of course, more data means bigger datasets to gather, clean, annotate, store, distribute, which eventually puts a limit to the size of real-world datasets. But if in your case you have available data to train on, use it.
Upvotes: 4