Vivek
Vivek

Reputation: 569

How is the AUC calculated for multi-class data in tensorflow?

The documentation for tf.keras.metrics.AUC says that when estimating the AUC for multi-class data (i.e., multi_label=False),

the data should be flattened into a single label before AUC computation.

What exactly does this mean?


Also, if you have multi-class data, it's possible to train a model without pre-flatting the labels, e.g., for a given model, you can run

model.compile(loss="binary_crossentropy", metrics=[tf.keras.metrics.AUC()])

and the model will calculate an AUC for you for each epoch. I know using binary cross-entropy loss in a multi-class problem tells tensorflow to setup a multi-label classification problem (see here), but I haven't told tf.keras.metrics.AUC that the data is multilabel. So what exactly is it calculating in this case?

Upvotes: 2

Views: 488

Answers (1)

akensert
akensert

Reputation: 304

To my understanding, 'flattened' means that the data will be reshaped into a one dimensional array as follows: [[0, 0, 1, 0], ..., [1, 0, 0, 0]] --> [0, 0, 1, 0, ..., 1, 0, 0, 0]

If multi_label=True, AUC will be computed separately for each label, then averaged across labels. If multi_label=False (default), AUC will be computed over the flattened data. So multi-labelled data (e.g., [[0, 1, 1, 0], ..., [1, 1, 0, 0]]) will be treated as single-labelled data ([0, 1, 1, 0, ..., 1, 1, 0, 0]).

From the documentation: "multi_label: boolean indicating whether multilabel data should be treated as such, wherein AUC is computed separately for each label and then averaged across labels, or (when False) if the data should be flattened into a single label before AUC computation. In the latter case, when multilabel data is passed to AUC, each label-prediction pair is treated as an individual data point. Should be set to False for multi-class data."

Upvotes: 1

Related Questions