hhz
hhz

Reputation: 19

How to understand auc_precision_recall curve in tensorboard?

Versions:

TensorFlow: 1.6.0
TensorBoard: 1.6.0

What i'm doing and familiar with:

  1. Using Pre-made Estimator tf.estimator.DNNClassifier to train a binary classification model with a largely skewed dataset(namely imbalanced dataset).
  2. So, i have to use Precision-Recall curve to chooses an optimal model instead of AUC curve.
  3. I changed nothing to the tf.estimator.DNNClassifier(Of course, i did changed these three parameters:hidden_units, feature_columns, model_dir).
  4. After the accuracy of the model reached a threshold and stop to optimize, i have to continue training like this: pick out one feature iteratively from all features and do training, so that i can getting rid of some noise features as possible.
  5. I did as Step 4, every time i picked out a feature i got a new training result and a new pictures about auc_precision_recall curve from TensorBoard. Namely, When i picked out FEATURE_A i got figure A, picked out FEATURE_B i got figure B,and picked out FEATURE_C i got figure C.
    Pictures as follow:
    figure A, figure B, figure C
  6. Descriptions about the above auc_precision_recall curve figures:
    • x axes: indicate training step.
    • y axes: range from 0 to 1 (this is what i want to know: what does y mean?).
  7. Following is a standard Precision-Recall curve from this site.(I paste it here just for us to discuss my problem easily).
    standard Precision-Recall curve
  8. Descriptions about the above standard Precision-Recall curve:
    • x axes: Recall, range from 0 to 1.
    • y axes: Precision, range from 0 to 1.

My Problems:

  1. What's the meaning for a value in y axes in a TensorBoard auc_precision_recall curve?
  2. What's the relationship between a TensorBoard auc_precision_recall curve and a standard Precision-Recall curve?
  3. Why the value in y axes in a TensorBoard auc_precision_recall curve so strange?
    • In figure A, the first point is (x, y) = (1, 0.5009), why y is 0.5009 even in the 1st Step? and also why most of the other values also keeps in 0.5(from figure A we can easily read about this)?
    • Also in figure B, the first point is (x, y) = (7, 0.4625), why this y(0.4625) value is not equal to a value near 0 even in the first a few training steps as figure C shows?

Upvotes: 0

Views: 1938

Answers (2)

Eric
Eric

Reputation: 1

To answer questions 1 and 2. AUC means Area Under the Curve. Therefore, you are looking at Area under the Precision-Recall (PR) Curve. The y-axis gives you this area, which is between 0 and 1 because these are min and max areas achievable on a PR curve.

Upvotes: 0

hhz
hhz

Reputation: 19

I've got the answer: this is a bug in the tensorflow version 1.6.0 caused by the wrong way(trapezoidal) to calculate the value of AUC_PR, and this bug has fixed in the latest version 1.8.0 by this commit. So if you are training a largely skewed dataset, remember to update tensorflow to the latest version 1.8.0.

Upvotes: 0

Related Questions