What do the FiftyOne evaluation metrics mean?

Question

I have a dataset I use to test my object detection model, let's say test_dataset.

When evaluating with COCO eval (through YOLOX eval.py script) for a given model, I get this result:

Average forward time: 23.05 ms, Average NMS time: 2.60 ms, Average inference time: 25.65 ms
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.724
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.957
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.831
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.278
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.591
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.810
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.535
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.755
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.759
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.349
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.649
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.839
per class AP:
| class        | AP     | class   | AP     | class        | AP     |
|:-------------|:-------|:--------|:-------|:-------------|:-------|
| cargo        | 59.491 | ferry   | 87.701 | fishing boat | 67.328 |
| sailing boat | 75.134 |         |        |              |        |
per class AR:
| class        | AR     | class   | AR     | class        | AR     |
|:-------------|:-------|:--------|:-------|:-------------|:-------|
| cargo        | 64.802 | ferry   | 89.717 | fishing boat | 71.506 |
| sailing boat | 77.748 |

However, when evaluating with FiftyOne, I get the following:

              precision    recall  f1-score   support

       cargo       0.76      0.91      0.83       606
       ferry       0.97      1.00      0.99       990
fishing boat       0.85      0.96      0.91       332
sailing boat       0.87      0.97      0.92       706

   micro avg       0.88      0.97      0.92      2634
   macro avg       0.87      0.96      0.91      2634
weighted avg       0.88      0.97      0.92      2634

I was using this script:

results = dataset.evaluate_detections(
    "predictions",
    gt_field="detections",
    compute_mAP=True,
    method="coco"
)

results.print_report()

I was expecting the same precision and recall metrics, since they both use COCO style evaluation. Setting the IoU parameter doesn't help getting coherent values.

I can't understand how to make theses metrics match.

What do the FiftyOne evaluation metrics mean?

Answers (1)

Related Questions