Riley Fitzpatrick
Riley Fitzpatrick

Reputation: 919

Tensorflow & Keras can't load .ckpt save

So I am using the ModelCheckpoint callback to save the best epoch of a model I am training. It saves with no errors, but when I try to load it, I get the error:

2019-07-27 22:58:04.713951: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open C:\Users\Riley\PycharmProjects\myNN\cp.ckpt: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

I have tried using the absolute/full path, but no luck. I'm sure I could use EarlyStopping, but I'd still like to understand why I am getting the error. Here is my code:

from __future__ import absolute_import, division, print_function

import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
import datetime
import statistics

(train_images, train_labels), (test_images, test_labels) = np.load("dataset.npy", allow_pickle=True)

train_images = train_images / 255
test_images = test_images / 255

train_labels = list(map(float, train_labels))
test_labels = list(map(float, test_labels))
train_labels = [i/10 for i in train_labels]
test_labels = [i/10 for i in test_labels]

'''
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(128, 128)),
    keras.layers.Dense(64, activation=tf.nn.relu),
    keras.layers.Dense(1)
  ])

'''

start_time = datetime.datetime.now()

model = keras.Sequential([
    keras.layers.Conv2D(32, kernel_size=(5, 5), strides=(1, 1), activation='relu', input_shape=(128, 128, 1)),
    keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),
    keras.layers.Dropout(0.2),
    keras.layers.Conv2D(64, (5, 5), activation='relu'),
    keras.layers.MaxPooling2D(pool_size=(2, 2)),
    keras.layers.Dropout(0.2),
    keras.layers.Flatten(),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(1000, activation='relu'),
    keras.layers.Dense(1)

])

model.compile(loss='mean_absolute_error',
    optimizer=keras.optimizers.SGD(lr=0.01),
    metrics=['mean_absolute_error', 'mean_squared_error'])

train_images = train_images.reshape(328, 128, 128, 1)
test_images = test_images.reshape(82, 128, 128, 1)

model.fit(train_images, train_labels, epochs=100, callbacks=[keras.callbacks.ModelCheckpoint("cp.ckpt", monitor='mean_absolute_error', save_best_only=True, verbose=1)])

model.load_weights("cp.ckpt")

predictions = model.predict(test_images)

totalDifference = 0
for i in range(82):
    print("%s: %s" % (test_labels[i] * 10, predictions[i] * 10))
    totalDifference += abs(test_labels[i] - predictions[i])

avgDifference = totalDifference / 8.2

print("\n%s\n" % avgDifference)
print("Time Elapsed:")
print(datetime.datetime.now() - start_time)

Upvotes: 4

Views: 5848

Answers (3)

Sohaib Anwaar
Sohaib Anwaar

Reputation: 1547

model.load_weights will not work here. Reason is mentioned in the above answer. You can load weights by this code. Load your model first and than load weights. I hope this code will help you out

import tensorflow as tf

model=dense_net()
ckpt = tf.train.Checkpoint(
step=tf.Variable(1, dtype=tf.int64),  net=model)
ckpt.restore(tf.train.latest_checkpoint("/kaggle/working/training_1/cp.ckpt.data-00001-of-00002"))

Upvotes: 2

Szymon Maszke
Szymon Maszke

Reputation: 24691

TLDR; you are saving whole model, while trying to load only weights, that's not how it works.

Explanation

Your model's fit:

model.fit(
    train_images,
    train_labels,
    epochs=100,
    callbacks=[
        keras.callbacks.ModelCheckpoint(
            "cp.ckpt", monitor="mean_absolute_error", save_best_only=True, verbose=1
        )
    ],
)

As save_weights=False by default in ModelCheckpoint, you are saving whole model to .ckpt.

BTW. File should be named .hdf5 or .hf5 as it's Hierarchical Data Format 5. As Windows is not extension-agnostic you may run into some problems if tensorflow / keras relies on extension on this OS.

On the other hand you are loading the model's weights only, while the file contains whole model:

model.load_weights("cp.ckpt")

Tensorflow's checkpointing (.cp) mechanism is different from Keras's (.hdf5), so watch out for that (there are plans to integrate them more closely, see here and here).

Solution

So, either use the callback as you currently do, BUT use model.load("model.hdf5") or add save_weights_only=True argument to ModelCheckpoint:

model.fit(
    train_images,
    train_labels,
    epochs=100,
    callbacks=[
        keras.callbacks.ModelCheckpoint(
            "weights.hdf5",
            monitor="mean_absolute_error",
            save_best_only=True,
            verbose=1,
            save_weights_only=True,  # Specify this
        )
    ],
)

and you can use your model.load_weights("weights.hdf5").

Upvotes: 6

pedro_bb7
pedro_bb7

Reputation: 2091

import tensorflow as tf

# Create some variables.
v1 = tf.Variable(tf.random_normal([784, 200], stddev=0.35), name="v1")
v2 = tf.Variable(tf.random_normal([784, 200], stddev=0.35), name="v2")

# Add an op to initialize the variables.
init_op = tf.global_variables_initializer()

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, initialize the variables, do some work, save the
# variables to disk.
with tf.Session() as sess:
  sess.run(init_op)
  # Do some work with the model.

  # Save the variables to disk.
  save_path = saver.save(sess, "/tmp/model.ckpt")
  print("Model saved in file: %s" % save_path)

# Later, launch the model, use the saver to restore variables from disk, and
# do some work with the model.
with tf.Session() as sess:
  # Restore variables from disk.
  saver.restore(sess, "/tmp/model.ckpt")
  print("Model restored.")
  # Do some work with the model

Source

Upvotes: 0

Related Questions