Reputation: 1505
Do you know if it's possible to use a very small subset of my training data (100 or 500 instances only for example), to train very rough CNN network quickly in order to compare different architectures, then select the best performing one ?
When I say "possible", I mean is there evidence that applying that kind of selection strategy works, and that the selected network will consistently outperform the other to for this specific task.
Thank you,
For information, the project in question would constist of two stages CNNs to classify multichannel timeseries. The first CNN would forecast the inputs data over the next period of time, then the second CNN would use this forecast and classify the results in two categories.
Upvotes: 0
Views: 272
Reputation: 7432
The procedure you are talking about is actually used in practice. When tuning hyperparameters, a lot of people select a subset of the whole dataset to do this.
Is the best architecture on the subset necessarily the best on the full dataset? NO! However, it's the best guess you have and that's why it's useful.
A couple of things to note on your question:
100-500 instances is extremely low! The CNN still needs to be trained. When we say subset we usually mean tens of thousands of images (out of the millions of the dataset). If your dataset is under 50000 images then why do you need a subset? Train on the whole dataset.
Contrary to what a lot of people believe, the details of the architecture are of little importance to the classification performance. Some of the hyperparameters you mention (e.g. kernel size) are of secondary importance. The key things you should focus on is depth, size of layers, use of pooling/skip connections/batch norm/dropout, etc.
Upvotes: 1