Jürgen K.
Jürgen K.

Reputation: 3487

Saving prediction_generator results in tensorflow and python

Let's assume, we fitted a model in TensorFlow flow

model.fit(
    train_generator,
    epochs=epochs,
    verbose=1,
    steps_per_epoch=steps_per_epoch,
    validation_data=valid_generator,
    validation_steps=val_steps_per_epoch).history  

In the next step, we generate predictions.

 Y_pred = model.predict_generator(valid_generator, np.ceil(valid_generator.samples / valid_generator.batch_size))

I'm wondering if it is possible to save predictions and load them from disk for debugging subsequent code without retraining the model and predicting the data each time after each restart.

Of course, it is possible to save and load the model, but there is still some overhead on predicting.

Any ideas are highly appreciated. Thanks in advance

Upvotes: 0

Views: 574

Answers (1)

Innat
Innat

Reputation: 17229

Based on my understanding from the comment box, here is some possible solution for your query, let me know if it works for you or not.

I'm wondering if it is possible to save predictions and load them from disk for debugging subsequent code without retraining the model and predicting the data each time after each restart.


First, we build a model and train it first.

import tensorflow as tf 

# Model 
input = tf.keras.Input(shape=(28, 28))
base_maps = tf.keras.layers.Flatten(input_shape=(28, 28))(input)
base_maps = tf.keras.layers.Dense(128, activation='relu')(base_maps)
base_maps = tf.keras.layers.Dense(units=10, activation='softmax', name='primary')(base_maps) 
model = tf.keras.Model(inputs=[input], outputs=[base_maps])

# compile 
model.compile(
    loss = tf.keras.losses.CategoricalCrossentropy(),
    metrics = ['accuracy'],
    optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3) )

# data 
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = tf.divide(x_train, 255)
y_train = tf.one_hot(y_train , depth=10) 

# customized fit 
model.fit(x_train, y_train, batch_size=512, epochs=3, verbose = 1)

Next, We use this trained model to predict unseen data (x_test) and save the prediction to disk so that we can later debug model performance issue.

import numpy as np 
import pandas as pd 

y_pred = model.predict(x_test)       # get prediction 
y_pred = np.argmax(y_pred, axis=-1)  # get class labels

# save ground truth and prediction to local disk as CSV file
oof = pd.DataFrame(dict(
           gt = y_test, 
           pd   = y_pred, 
     ))
oof.to_csv('oof.csv', index=False)
oof.head(20)

# compute how many prediction are accurate or match 
oof['check'] = np.where((oof['gt'] == oof['pd']), 'Match', 'No Match')
oof.check.value_counts()
Match       9492
No Match     508
Name: check, dtype: int64

Like this, we can do various types of analysis from the model prediction and ground truth. However, in order to save probabilities (instead of actual labels), we can also do something like this: reference.

y_pred = model.predict(x_test)
np.savetxt("y_pred.csv", y_pred , delimiter=",")

Upvotes: 1

Related Questions