Reputation: 2189
I am using ModelCheckpoint
callback from Keras:
checkpointer = ModelCheckpoint(filepath= model_filepath,
verbose=1,
save_best_only=True)
I cannot train my model in one step, so I have to save/load my model several times and resume the training to improve my model. However, when I load my model and resume the training, when the first epoch ends, since the val_loss changes from inf to some value (let's say 0.23) the previous model will be always overwritten. But my previous best val_loss in the previous time that I was training my model was 0.19 (0.19 < 0.23 => the previous model is still the best => previous model should not be overwritten).
How can I tell Keras: Please consider the previous best val_loss in the previous time that I trained my model and stop this wrong behavior.
Upvotes: 0
Views: 1442
Reputation: 11
from tensorflow.keras.callbacks import ModelCheckpoint, LambdaCallback
work_dir = "drive/My Drive/Training Records/"
NUM_EPOCHS = 300
checkpointer_name = "model_checkpoint.hdf5"
log_name = "log_"+checkpointer_name[:-5]+".log"
Step 1:
checkpointer = ModelCheckpoint(filepath = work_dir+checkpointer_name,
monitor='val_loss',
mode='auto',
verbose = 0,
save_best_only =False
)
checkpointer_best = ModelCheckpoint(filepath = work_dir+"best_"+checkpointer_name,
monitor='val_loss',
mode='auto',
verbose = 1,
save_best_only = True
)
Step 2:
def checkBestPerformance(epoch, logs):
log_data = pd.read_csv(work_dir+log_name, sep=',', usecols=['val_loss', 'val_accuracy'], engine='python')
min_val_loss = min(log_data.val_loss.values)
max_val_acc = max(log_data.val_accuracy.values)
current_val_acc = logs['val_accuracy']
current_val_loss = logs['val_loss']
save_filepath = work_dir+"best_"+checkpointer_name
if current_val_loss < min_val_loss:
model.save(filepath = save_filepath)
print("\nval_loss decreased from", min_val_loss, "to", current_val_loss, ".")
elif (current_val_loss==min_val_loss) and (current_val_acc>max_val_acc):
model.save(filepath = save_filepath)
print("\nval_accuracy increased from", max_val_acc, "to", current_val_acc, ".")
else:
print("\nPerformance did not improve from existing min_val_loss =", min_val_loss, ", max_val_acc =", max_val_acc, ".")
Step 3:
epochs_completed = 0
csv_logger = CSVLogger(work_dir+log_name, separator=',', append=True)
try:
log_data = pd.read_csv(work_dir+log_name, sep=',', usecols=['epoch'], engine='python')
epochs_completed = log_data.shape[0]
if epochs_completed > 0:
model = load_model(work_dir+checkpointer_name)
list_callbacks = [checkpointer, LambdaCallback(on_epoch_end=checkBestPerformance), csv_logger]
print("epochs_completed =", epochs_completed)
except:
list_callbacks = [checkpointer, checkpointer_best, csv_logger]
Step 4:
print("Previously completed epochs =", epochs_completed, "\n")
history = model.fit(final_train_imageset, final_train_label,
shuffle=True,
batch_size = BATCH_SIZE,
epochs = NUM_EPOCHS - epochs_completed,
validation_split = 0.1,
callbacks=list_callbacks
)
Upvotes: 1
Reputation: 119
Since it is not programmed for this purpose I would not consider it wrong.
I would suggest that you change the filepath parameter for of the callback whenever you resume your training, this way at least you do not lose the previous best.
Upvotes: 1