Reputation: 229
I have made 3 groups of image data:
1- train containing 130,523 images. 2- validation cantoning 14,503 images. 3- test containing 94,500 images. Now i want to create .lmdb formats for my data to be used for training. in the tutorial it says group your data into train and val. so does it mean that I should just use the train and val data set and do not use the test at all? later when I want to test my model then what happens to test data set? shouldn't they be converted to .lmdb again? I want to make sure I have understood the differences. Sorry if a question is very basic but I did not find any answers.
Upvotes: 3
Views: 1269
Reputation: 361
Sometime the term validation and test become interchangeable (at least in caffe). However, from the size of each set of your data, I consider that validation set (containing ~14k images) are supposed to be used to check the accuracy of your trained model before you actually test the model to the unseen real world data. Thus, your test dataset (~94k images) will be considered as unseen real world data.
To get insight how to do train-val-test process, also have a look at the examples provided in caffe directory. 00-classification.ipynb
and 01-learning-lenet.ipynb
would be enough.
Upvotes: 0
Reputation: 1342
There are three types of datasets.
Training set - This is the data that the network is trained on.
Testing set - This dataset is used to verify that the network is not over fitted to the training set and that it is regularised.
Validation set - Since we actually use the testing set during training (to check the regularisation) it is advisable to keep a separate test set which the data has not seen till now. Running the network on this set will inform us how the network will perform when it is tested in the real world.
In your case, you should make lmdb files for all three. During training use the training and testing set. After training use the validation set to confirm that the trained network is accurate.
Upvotes: 3