Reputation: 407
I am building my own generator to use with fit_generator() and predict_generator() functions from keras library. My generator works but I wondering if it has been build correctly. Especially for validation and test sets.
For these two sets, I disable the data augmentation processing since it is only used for the training phase but I am still using randomness to select data from my inputs. Thus I would like to know if is it correct to still using randomness selection of data for validation set?
I think It is but I am not totally sure.
def generator(inputs, labels, validation=False):
batch_inputs = np.zeros((batch_size, *input_shape))
batch_labels = np.zeros((batch_size, num_classes))
indexes = list(range(0,len(inputs))
while True:
for i in range(self.batch_size):
# choose random index in inputs
if validation:
index = indexes.pop()
else:
index = random.randint(0, len(inputs) - 1)
batch_inputs[i] = rgb_processing(inputs[index], validation) # data_augmentation processing functions validation=true --> disable data augmentation
batch_labels[i] = to_categorical(labels[index], num_classes=self.num_classes)
yield batch_inputs, batch_labels
train_batches = generator(train.X.values, train.y.values)
validate_batches = generator(validate.X.values, validate.y.values, validation=True)
Upvotes: 1
Views: 228
Reputation: 1249
In the validation, the order of the image should not affect your results. So in theory, there is no problem to give the validation images in a random order. You just want to be sure that all your validation images are used only once so your results are comparable.
Upvotes: 1