Reputation: 578
I'm following an article which says the test folder should also contain a single folder inside which all the test images are present(there will not be subfolders/label folders). On the other hand the train and validation folders should contain ‘n’ folders each containing images of the respective classes. For example:
structure 1
/Data
//train
classA folder
classB folder
classC folder
//val
classA folder
classB folder
classC folder
//test
test folder
Again, I learned about using the python library split-folder which splits the data in the following structure,
structure 2
/Data
//train
classA folder
classB folder
classC folder
//val
classA folder
classB folder
classC folder
//test
classA folder
classB folder
classC folder
I implemented one by using the python library split-folder (structure 2) and evaluated the model by using the following method,
model.evaluate(test_generator,batch_size=32)
here I only provided test_generator(which I got from flow_from_directory) to my evaluate function(I did not use any labels) and I got accuracy around 88%. my confusions are:
Upvotes: 1
Views: 568
Reputation: 856
Structure 2
just splits whatever is available, which is fundamentally correct. In reality, you'll most likely be using Structure 1
when using flow_from_directory()
. You can't perform evaluate()
without labels, so your test_generator
is more akin to a validation set, but you can technically evaluate using that "test" data since they would created the same way, but ideally used differently.flow_from_directory()
outputs Dataset
, which contains features classes
and labels class_indices
. When you pass a Dataset
object to evaluate()
, the method uses both features and labels from the passed variable. If you want to extract the labels from an object from flow_from_directory()
, if the variable is x
, it's x.class_indices
, which will be a dictionary. When you pass a Dataset
to predict()
, only the features are used. The labels are ignored. Unless you need to manually retrieve something within the Dataset
object, you don't need to access anything within that object when evaluating or predicting.split_folder
does not do anything with your model.The subfolders named after classes for train and val is when comparing the image to the label (the folder name - the class). This is how flow_from_directory()
keeps track of each image's class. Since prediction is supposed to use unseen data as input, it wouldn't have a label to compare to, hence no labels (or subfolder containing classes) when you split your test folder out.
Another thing you could do which is common is, just splitting your train and test set, then creating your validation set from your training set. But both methods are fundamentally the same.
Upvotes: 2