Eror
Eror

Reputation: 621

Using multiple validation sets with keras

I am training a model with keras using the model.fit() method. I would like to use multiple validation sets that should be validated on separately after each training epoch so that i get one loss value for each validation set. If possible they should be both displayed during training and as well be returned by the keras.callbacks.History() callback.

I am thinking of something like this:

history = model.fit(train_data, train_targets,
                    epochs=epochs,
                    batch_size=batch_size,
                    validation_data=[
                        (validation_data1, validation_targets1), 
                        (validation_data2, validation_targets2)],
                    shuffle=True)

I currently have no idea how to implement this. Is it possible to achieve this by writing my own Callback? Or how else would you approach this problem?

Upvotes: 26

Views: 9784

Answers (4)

user2458922
user2458922

Reputation: 1721

Lets Say your Data and pre processing is like

# Model / data parameters
num_classes = 10
input_shape = (28, 28, 1)

# Load the data and split it between train and test sets
(x_train, y_train), (x_valid, y_valid) = keras.datasets.mnist.load_data()
y_train_cp = y_train
y_valid_cp = y_valid

# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_valid = x_valid.astype("float32") / 255
# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_valid = np.expand_dims(x_valid, -1)
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_valid.shape[0], "validation samples")


# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_valid = keras.utils.to_categorical(y_valid, num_classes)

You can create multiple validation like

_, x_train_subset, _, y_train_subset = train_test_split(x_train, y_train, test_size=0.25, random_state=42)
    validationMasks = {}
for i in range(0,10):
    validationMasks[i] = y_valid_cp==i

#Modified Train to Evaluate Over fitting
validationSets = {}

validationSets['train_sub'] = datagen_valid.flow(x_train_subset,y_train_subset)

# Specail Case Validation
#y_valid[validationMasks[randomDigit]]
validationSets['1'] = datagen_train.flow(x_valid[validationMasks[1]],y_valid[validationMasks[1]]) #Least confusing
validationSets['0'] = datagen_train.flow(x_valid[validationMasks[0]],y_valid[validationMasks[0]]) 
validationSets['6'] = datagen_train.flow(x_valid[validationMasks[6]],y_valid[validationMasks[6]])
validationSets['8'] = datagen_train.flow(x_valid[validationMasks[8]],y_valid[validationMasks[8]])

Now, You can use a Custom Call Back as

class MultipleValidationCallBack(keras.callbacks.Callback):

def on_epoch_end(self, epoch, logs=None):
    for thisValidationType in validationSets:
        #datagen = 
        thisLoss = self.model.evaluate(validationSets[thisValidationType])
        logs[thisValidationType+"_loss"] = thisLoss[0]
        logs[thisValidationType+"_acc"] = thisLoss[1]

Model Fit can be like,

modelLog = getModel().fit(
        datagenFlow_train,validation_data=datagenFlow_valid,
        callbacks=[MultipleValidationCallBack()], #
        epochs=5)
historyDf = pd.DataFrame(modelLog.history)
historyDf[['train_sub_loss','val_loss','0_loss','1_loss']].plot()

For Details, please refer, End to End workable code and for explanation A post on medium

Upvotes: 0

Eror
Eror

Reputation: 621

I ended up writing my own Callback based on the History callback to solve the problem. I'm not sure if this is the best approach but the following Callback records losses and metrics for the training and validation set like the History callback as well as losses and metrics for additional validation sets passed to the constructor.

class AdditionalValidationSets(Callback):
    def __init__(self, validation_sets, verbose=0, batch_size=None):
        """
        :param validation_sets:
        a list of 3-tuples (validation_data, validation_targets, validation_set_name)
        or 4-tuples (validation_data, validation_targets, sample_weights, validation_set_name)
        :param verbose:
        verbosity mode, 1 or 0
        :param batch_size:
        batch size to be used when evaluating on the additional datasets
        """
        super(AdditionalValidationSets, self).__init__()
        self.validation_sets = validation_sets
        for validation_set in self.validation_sets:
            if len(validation_set) not in [3, 4]:
                raise ValueError()
        self.epoch = []
        self.history = {}
        self.verbose = verbose
        self.batch_size = batch_size

    def on_train_begin(self, logs=None):
        self.epoch = []
        self.history = {}

    def on_epoch_end(self, epoch, logs=None):
        logs = logs or {}
        self.epoch.append(epoch)

        # record the same values as History() as well
        for k, v in logs.items():
            self.history.setdefault(k, []).append(v)

        # evaluate on the additional validation sets
        for validation_set in self.validation_sets:
            if len(validation_set) == 3:
                validation_data, validation_targets, validation_set_name = validation_set
                sample_weights = None
            elif len(validation_set) == 4:
                validation_data, validation_targets, sample_weights, validation_set_name = validation_set
            else:
                raise ValueError()

            results = self.model.evaluate(x=validation_data,
                                          y=validation_targets,
                                          verbose=self.verbose,
                                          sample_weight=sample_weights,
                                          batch_size=self.batch_size)

            for metric, result in zip(self.model.metrics_names,results):
                valuename = validation_set_name + '_' + metric
                self.history.setdefault(valuename, []).append(result)

which i am then using like this:

history = AdditionalValidationSets([(validation_data2, validation_targets2, 'val2')])
model.fit(train_data, train_targets,
          epochs=epochs,
          batch_size=batch_size,
          validation_data=(validation_data1, validation_targets1),
          callbacks=[history]
          shuffle=True)

Upvotes: 36

Nima Aghli
Nima Aghli

Reputation: 484

I tested this on TensorFlow 2 and it worked. You can evaluate on as many validation sets as you want at the end of each epoch:

class MyCustomCallback(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs=None):
        res_eval_1 = self.model.evaluate(X_test_1, y_test_1, verbose = 0)
        res_eval_2 = self.model.evaluate(X_test_2, y_test_2, verbose = 0)
        print(res_eval_1)
        print(res_eval_2)

And later:

my_val_callback = MyCustomCallback()
# Your model creation code
model.fit(..., callbacks=[my_val_callback])

Upvotes: 5

mss
mss

Reputation: 368

Considering the current keras docs, you can pass callbacks to evaluate and evaluate_generator. So you can call evaluate multiple times with different datasets.

I have not tested it, so I am happy if you comment your experiences with it below.

Upvotes: 0

Related Questions