Bluetail
Bluetail

Reputation: 1291

How to save checkpoints as filenames with every epoch and then load the weights from the latest saved one in Tensorflow 2?

When I run the following code, I am getting folders created named cp_1, cp_2 while I want to save checkpoint files with every epoch. Then I want to use the latest saved checkpoint file to load the weights for my model instance with model.load_weights(tf.train.latest_checkpoint('model_checkpoints_5000'))

how can I do it please?

import os
import tensorflow as tf
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPooling2D

# Use the CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
x_train = x_train / 255.0
x_test = x_test / 255.0

# Use smaller subset -- speeds things up
x_train = x_train[:10000]
y_train = y_train[:10000]
x_test = x_test[:1000]
y_test = y_test[:1000]

# define a function that creates a new instance of a simple CNN.
def create_model():
    model = Sequential([
        Conv2D(filters=16, input_shape=(32, 32, 3), kernel_size=(3, 3), 
               activation='relu', name='conv_1'),
        Conv2D(filters=8, kernel_size=(3, 3), activation='relu', name='conv_2'),
        MaxPooling2D(pool_size=(4, 4), name='pool_1'),
        Flatten(name='flatten'),
        Dense(units=32, activation='relu', name='dense_1'),
        Dense(units=10, activation='softmax', name='dense_2')
    ])
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    return model


checkpoint_5000_path = './model_checkpoints_5000/cp_{epoch:02d}'
checkpoint_5000 = ModelCheckpoint(filepath = checkpoint_5000_path,
                                 save_weights= True,
                                 save_freq = 'epoch',
                                 verbose = 1)


model = create_model()
model.fit(x = x_train,
          y = y_train,
          epochs = 3,
          validation_data = (x_test, y_test),
          batch_size = 10,
          callbacks = [checkpoint_5000])

My output is the following.

Epoch 00001: saving model to ./model_checkpoints_5000\cp_01
INFO:tensorflow:Assets written to: ./model_checkpoints_5000\cp_01\assets
Epoch 2/3
1000/1000 [==============================] - 3s 3ms/step - loss: 1.4493 - accuracy: 0.4744 - val_loss: 1.4664 - val_accuracy: 0.4770

I have tried adding .h5 to

'./model_checkpoints_5000/cp_{epoch:02d}.h5'. 

however, then if I try tf.train.latest_checkpoint('model_checkpoints_5000'), I get None? while I should be getting the file name cp_03.h5?

Upvotes: 0

Views: 756

Answers (1)

user11530462
user11530462

Reputation:

You need to use below code after training the model:

checkpoint_dir = os.path.dirname(checkpoint_5000_path)
os.listdir(checkpoint_dir)

Output:

['cp_01',
 'cp_00.h5',
 'cp_03',
 'cp_00.data-00000-of-00001',
 'cp_00.index',
 'cp_03.h5',
 'cp_02',
 'cp_01.h5',
 'cp_02.h5',
 'checkpoint']

Please check this link for more details.

Upvotes: 1

Related Questions