Reputation: 3487
Let's assume, we fitted a model in TensorFlow flow
model.fit(
train_generator,
epochs=epochs,
verbose=1,
steps_per_epoch=steps_per_epoch,
validation_data=valid_generator,
validation_steps=val_steps_per_epoch).history
In the next step, we generate predictions.
Y_pred = model.predict_generator(valid_generator, np.ceil(valid_generator.samples / valid_generator.batch_size))
I'm wondering if it is possible to save predictions and load them from disk for debugging subsequent code without retraining the model and predicting the data each time after each restart.
Of course, it is possible to save and load the model, but there is still some overhead on predicting.
Any ideas are highly appreciated. Thanks in advance
Upvotes: 0
Views: 574
Reputation: 17229
Based on my understanding from the comment box, here is some possible solution for your query, let me know if it works for you or not.
I'm wondering if it is possible to save predictions and load them from disk for debugging subsequent code without retraining the model and predicting the data each time after each restart.
First, we build a model and train it first.
import tensorflow as tf
# Model
input = tf.keras.Input(shape=(28, 28))
base_maps = tf.keras.layers.Flatten(input_shape=(28, 28))(input)
base_maps = tf.keras.layers.Dense(128, activation='relu')(base_maps)
base_maps = tf.keras.layers.Dense(units=10, activation='softmax', name='primary')(base_maps)
model = tf.keras.Model(inputs=[input], outputs=[base_maps])
# compile
model.compile(
loss = tf.keras.losses.CategoricalCrossentropy(),
metrics = ['accuracy'],
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3) )
# data
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = tf.divide(x_train, 255)
y_train = tf.one_hot(y_train , depth=10)
# customized fit
model.fit(x_train, y_train, batch_size=512, epochs=3, verbose = 1)
Next, We use this trained model to predict unseen data (x_test
) and save the prediction to disk so that we can later debug model performance issue.
import numpy as np
import pandas as pd
y_pred = model.predict(x_test) # get prediction
y_pred = np.argmax(y_pred, axis=-1) # get class labels
# save ground truth and prediction to local disk as CSV file
oof = pd.DataFrame(dict(
gt = y_test,
pd = y_pred,
))
oof.to_csv('oof.csv', index=False)
oof.head(20)
# compute how many prediction are accurate or match
oof['check'] = np.where((oof['gt'] == oof['pd']), 'Match', 'No Match')
oof.check.value_counts()
Match 9492
No Match 508
Name: check, dtype: int64
Like this, we can do various types of analysis from the model prediction and ground truth. However, in order to save probabilities (instead of actual labels), we can also do something like this: reference.
y_pred = model.predict(x_test)
np.savetxt("y_pred.csv", y_pred , delimiter=",")
Upvotes: 1