Reputation: 516
I'm trying to wrap my head around this but struggling to understand how I can compute the f1-score in an object detection task.
Ideally, I would like to know false positives, true positives, false negatives and true negatives for every target in the image (it's a binary problem with an object in the image as one class and the background as the other class).
Eventually I would also like to extract the false positive bounding boxes from the image. I'm not sure if this is efficient but I'd save the image names and bbox predictions and whether they are false positives etc. into a numpy file.
I currently have this set up with a batch size of 1 so I can apply a non-maximum suppression algorithm per image:
def apply_nms(orig_prediction, iou_thresh=0.3):
# torchvision returns the indices of the bboxes to keep
keep = torchvision.ops.nms(orig_prediction['boxes'], orig_prediction['scores'], iou_thresh)
final_prediction = orig_prediction
final_prediction['boxes'] = final_prediction['boxes'][keep]
final_prediction['scores'] = final_prediction['scores'][keep]
final_prediction['labels'] = final_prediction['labels'][keep]
return final_prediction
cpu_device = torch.device("cpu")
model.eval()
with torch.no_grad():
for images, targets in valid_data_loader:
images = list(img.to(device) for img in images)
outputs = model(images)
outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs]
predictions = apply_nms(outputs[0], iou_thresh=0.3)
Any idea on how I can determine the aforementioned classification metrics and f1-score?
I've come across this line in an evaluation code provided by torchvision and wondering whether it would help me going forward:
res = {target["image_id"].item(): output for target, output in zip(targets, outputs)}
Upvotes: 0
Views: 2540
Reputation: 516
So I've implemented the f1 score to be calculated globally- that is for the entire dataset.
The implementation below gives an example of determining the f1-score for a validation set.
The outputs of the model are in a dictionary format, and so we need to place them into tensors like this:
predicted_boxes (list): [[train_index, class_prediction, prob_score, x1, y1, x2, y2],[],...[]]
train_index: index of image that the specific bbox comes from class_prediction: integer value representing class prediction prob_score: outputed objectiveness score for a bbox x1,y1,x2,y2: (x1, y1) and (x2,y2) bbox coordinates
gt_boxes (list): [[train_index, class_prediction, prob_score, x1, y1, x2, y2],[],...[]]
Where prob_score
is just 1
for the ground truth inputs (it could be anything really as long as that dimension is specified and filled in).
IoU is also implemented in torchvision which makes everything a lot easier.
I hope this helps others as I couldn't find another implementation of f1 score in object detection anywhere else.
model_test.eval()
with torch.no_grad():
global_tp = []
global_fp = []
global_gt = []
valid_df_unique = get_unique(valid_df['image_id'])
for images, targets in valid_data_loader:
images = list(img.to(device) for img in images)
outputs = model_test(images)
outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs]
predictions = apply_nms(outputs[0], iou_thresh=0.1)
# looping through each class
for c in range(num_classes):
# detections (list): predicted_boxes that are class c
detections = []
# ground_truths (list): gt_boxes that are class c
ground_truths = []
for b,la,s in zip(predictions['boxes'], predictions['labels'],predictions['scores']):
updated_detection_array = [targets[0]['image_id'].item(), la.item(), s.item(), b[0].item(),b[1].item(),b[2].item(),b[3].item()]
if la.item() == c:
detections.append(updated_detection_array)
for b,la in zip(targets[0]['boxes'], targets[0]['labels']):
updated_gt_array = [targets[0]['image_id'].item(), la.item(), 1, b[0].item(),b[1].item(),b[2].item(),b[3].item()]
if la.item() == c:
ground_truths.append(updated_gt_array)
global_gt.append(updated_gt_array)
# use Counter to create a dictionary where key is image # and value
# is the # of bboxes in the given image
amount_bboxes = Counter([gt[0] for gt in ground_truths])
# goal: keep track of the gt bboxes we have already "detected" with prior predicted bboxes
# key: image #
# value: tensor of 0's (size is equal to # of bboxes in the given image)
for key, value in amount_bboxes.items():
amount_bboxes[key] = torch.zeros(value)
# sort over the probabiliity scores of the detections
detections.sort(key = lambda x: x[2], reverse = True)
true_Positives = torch.zeros(len(detections))
false_Positives = torch.zeros(len(detections))
total_gt_bboxes = len(ground_truths)
false_positives_frame = []
true_positives_frame = []
# iterate through all detections in given class c
for detection_index, detection in enumerate(detections):
# detection[0] indicates image #
# ground_truth_image: the gt bbox's that are in same image as detection
ground_truth_image = [bbox for bbox in ground_truths if bbox[0] == detection[0]]
# num_gt_boxes: number of ground truth boxes in given image
num_gt_boxes = len(ground_truth_image)
best_iou = 0
best_gt_index = 0
for index, gt in enumerate(ground_truth_image):
iou = torchvision.ops.box_iou(torch.tensor(detection[3:]).unsqueeze(0),
torch.tensor(gt[3:]).unsqueeze(0))
if iou > best_iou:
best_iou = iou
best_gt_index = index
if best_iou > iou_threshold:
# check if gt_bbox with best_iou was already covered by previous detection with higher confidence score
# amount_bboxes[detection[0]][best_gt_index] == 0 if not discovered yet, 1 otherwise
if amount_bboxes[detection[0]][best_gt_index] == 0:
true_Positives[detection_index] = 1
amount_bboxes[detection[0]][best_gt_index] == 1
true_positives_frame.append(detection)
global_tp.append(detection)
else:
false_Positives[detection_index] = 1
false_positives_frame.append(detection)
global_fp.append(detection)
else:
false_Positives[detection_index] = 1
false_positives_frame.append(detection)
global_fp.append(detection)
# remove nan values from ground truth list as list contains every mitosis image row entry (including images with no targets)
global_gt_updated = []
for gt in global_gt:
if math.isnan(gt[3]) == False:
global_gt_updated.append(gt)
global_fn = len(global_gt_updated) - len(global_tp)
precision = len(global_tp)/ (len(global_tp)+ len(global_fp))
recall = len(global_tp)/ (len(global_tp) + global_fn)
f1_score = 2* (precision * recall)/ (precision + recall)
print(len(global_tp))
print(recall)
print(precision)
print(f1_score)
Upvotes: 0
Reputation: 3958
The use of the terms precision, recall, and F1 score in object detection are slightly confusing because these metrics were originally used for binary evaluation tasks (e.g. classifiation). In any case, in object detection they have slightly different meanings:
let: TP - set of predicted objects that are successfully matched to a ground truth object (above IOU threshold for whatever dataset you're using, generally 0.5 or 0.7) FP - set of predicted objects that were not successfully matched to a ground truth object FN - set of ground truth objects that were not successfully matched to a predicted object
Precision: TP / (TP + FP)
Recall: TP / (TP + FN)
F1: 2*Precision*Recall /(Precision + Recall)
You can find many an implementation of the matching step (matching ground truth and predicted objects) generally provided with an dataset for evaluation, or you can implement it yourself. I'll suggest the py-motmetrics repository.
A simple implementation of the IOU calculation might look like:
def iou(self,a,b):
"""
Description
-----------
Calculates intersection over union for all sets of boxes in a and b
Parameters
----------
a : tensor of size [batch_size,4]
bounding boxes
b : tensor of size [batch_size,4]
bounding boxes.
Returns
-------
iou - float between [0,1]
average iou for a and b
"""
area_a = (a[2]-a[0]) * (a[3]-a[1])
area_b = (b[2]-b[0]) * (b[3]-b[1])
minx = max(a[0], b[0])
maxx = min(a[2], b[2])
miny = max(a[1], b[1])
maxy = min(a[3], b[3])
intersection = max(0, maxx-minx) * max(0,maxy-miny)
union = area_a + area_b - intersection
iou = intersection/union
return iou
Upvotes: 2