How to define bounding box label in YOLO V1?

Question

I'm trying to reproduct yolo v1，but I got some problem when create labels.
Yolo v1 has 2 bounding box, so, when a grid has an object, how to choose which bounding box to put location information?

For example

in a 416*416 pic, there is an object [x,y,w,h]=[100,200,300,400]
Follow yolo paper, normalized location to [0.25, 0.084, 0.72, 0.96]
As in the yolo paper, I need to create a tensor 7*7*30, most of them is 0, but in [4,5,:] tensor should be a vector length of 30: [confidence1, x1, y1, w1, h1, confidence2, x2, y2, w2, h2, c0,c1 ...] like this:

My question is , what is [confidence1, x1, y1, w1, h1, confidence2, x2, y2, w2, h2,]?

which one to choose A, B, C? and why？

Hadi GhahremanNezhad · Accepted Answer

In each cell, a fixed number (B) of bounding boxes with their confidence scores is generated. The confidence scores are calculated by multiplying the probability of each object and their intersection over union of the predicted box and the ground truth box. Each bounding box is indicated by 5 numbers: a quadruple (x;y;w;h), and the confidence score of the box. x and y are the coordinates of the center of the box, and w and h are the width and height of the box respectively. These four numbers are float values relative to the absolute width and height of the image, and they can be somewhere between 0.0 and 1.0. The confidence score indicates the likeliness of the box containing an object. Each grid cell contains conditional class probabilities for the number of different classes, and therefore, for each category of objects, there is one probability in each cell, regardless of the value of B. Note that the conditional class probability means that the probability of the object belonging to a specific class is conditioned on the box containing an object. Thus, for each grid cell, there are B×5 numbers indicating the bounding box information and the C class probabilities. This prediction information is encoded as a tensor in the shape of (S; S; B×5 + C).

As mentioned in the paper:

predictions are encoded as an S × S × (B * 5 + C) tensor. For evaluating YOLO on PASCAL VOC, we use S=7, B=2. PASCAL VOC has 20 labelled classes so C=20. Our final prediction is a 7×7×30 tensor.

As the dataset has 20 classes, and you have 2 bounding box in each cell, you would have: B×2+20 = 2×5+20 = 30. These 20 are class probabilities, which can be 0. In your tensor, you put 5 zeros, so I assume you have 5 classes of objects. So in this case:

B×2+5 = 9 and if your S is 7 as the default value in YOLO, your tensor will be of length: 7×7×9 = 4989. So for each bounding box you have 9 values and all these length-9 vectors are attached to each other. For your question, the A option seems more likely to be true.

This figure is for YOLOv3 and in your case, for each cell you have 2 bounding boxes and 20 classes (as opposed to the figure where there are 5 boxes and 6 classes). But the idea is similar.

How to define bounding box label in YOLO V1?

Answers (1)

Related Questions