Reputation: 2260
When training a model with TensorFlow 2 as shown below, should the validation data be separated from the training data before it is passed to the model's fit
method, or can it be part of the training set? At the end of the code below show two options. I would believe option 1 is the correct, but as I have seen some sources using option 2 I want to make sure I understand it correctly.
X_train, X_test, y_train, y_test = train_test_split(df_x, series_y)
best_weight_path = 'best_weights.hdf5'
numpy_x = df_x.to_numpy()
numpy_y = series_y.to_numpy()
numpy_x_train = X_train.to_numpy()
numpy_y_train = y_train.to_numpy()
numpy_x_test = X_test.to_numpy()
numpy_y_test = y_test.to_numpy()
model = Sequential()
model.add(Dense(32, input_dim=x.shape[1], activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(2, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, verbose=1, mode='auto')
checkpointer = ModelCheckpoint(filepath=best_weight_path, verbose=0, save_best_only=True)
option 1
model.fit(numpy_x_train, numpy_y_train, validation_data=(numpy_x_test, numpy_y_test), callbacks=[monitor, checkpointer], verbose=0, epochs=1000)
option 2
model.fit(numpy_x, numpy_y, validation_data=(numpy_x_test, numpy_y_test), callbacks=[monitor, checkpointer], verbose=0, epochs=1000)
Upvotes: 2
Views: 868
Reputation: 22021
the first option is correct... you split before the data and use your train to fit and evaluate on test/valid
the second option no... you are putting all your data to train while are passing a subpart of them for evaluation. Keras is not so clever to understand this. but to achieve what u are looking for in this second option u simply need validation_split = 0.xxx
model.fit(numpy_x, numpy_y, validation_split=0.xxx, callbacks=[monitor, checkpointer], verbose=0, epochs=1000)
in other words, u pass ALL your data in fit and then Keras operate a random split of them using a 0.xxx % for evaluation/testing
Upvotes: 3