Best way to evaluate performance with tf.data.Dataset

Question

I trained a model and now want to evaluate its performance on a test set. The test set is loaded as tf.data.TFRecordDataset object (from multiple TFRecords with multiple examples in each of them) which consists of ~million examples in the form of tuples (image, label), the data are batched. The raw labels are then mapped to the target integers (one-hot encoded) that the model needs to predict.

I understand that I can pass the Dataset object as an input to model.predict() which will output predictions for each example in the dataset. However, to compute some metric I need to compare true target values to the predicted ones, and to obtain the former ones I need to iterate through the Dataset, cause all true labels are stored in there.

This seems like a common task but I couldn't find a straightforward solution that works for huge dataset in TFRecord format. What would be the best way to compute, for instance, AUC per class in this case? Should I use Callbacks with model.predict(test_dataset)? Or should I process each example one by one in a loop, save true and predicted values into arrays and then use, for example, sklearn.metrics.roc_auc_score() to compute AUC scores for the two arrays? Or maybe I'm missing some obvious way to do it?

Thanks in advance!

Best way to evaluate performance with tf.data.Dataset

Answers (1)

Related Questions