Some student
Some student

Reputation: 131

Best way to evaluate performance with tf.data.Dataset

I trained a model and now want to evaluate its performance on a test set. The test set is loaded as tf.data.TFRecordDataset object (from multiple TFRecords with multiple examples in each of them) which consists of ~million examples in the form of tuples (image, label), the data are batched. The raw labels are then mapped to the target integers (one-hot encoded) that the model needs to predict.

I understand that I can pass the Dataset object as an input to model.predict() which will output predictions for each example in the dataset. However, to compute some metric I need to compare true target values to the predicted ones, and to obtain the former ones I need to iterate through the Dataset, cause all true labels are stored in there.

This seems like a common task but I couldn't find a straightforward solution that works for huge dataset in TFRecord format. What would be the best way to compute, for instance, AUC per class in this case? Should I use Callbacks with model.predict(test_dataset)? Or should I process each example one by one in a loop, save true and predicted values into arrays and then use, for example, sklearn.metrics.roc_auc_score() to compute AUC scores for the two arrays? Or maybe I'm missing some obvious way to do it?

Thanks in advance!

Upvotes: 3

Views: 1502

Answers (1)

MichaelJanz
MichaelJanz

Reputation: 1815

If you need all labels, why not just:

model.evaluate(test_dataset.take(-1))

or if your ds is too large for this action, just iterate over your dataset, calculate your metric and the mean at the end.

Upvotes: 2

Related Questions