Atif Butt
Atif Butt

Reputation: 41

How to convert 2D bounding box pixel coordinates (x, y, w, h) into relative coordinates (Yolo format)?

Hy! I am annotating image data through an online plateform which is generating output coordinates like this: bbox":{"top":634,"left":523,"height":103,"width":145} However, i want to use this annotation to train Yolo. So, I have to convert it in yolo format like this: 4 0.838021 0.605556 0.177083 0.237037

In this regard, i need help about how to convert it.

Upvotes: 4

Views: 8942

Answers (3)

Olivier D'Ancona
Olivier D'Ancona

Reputation: 928

Short Answer

In order to convert a bounding box to yolo format, you'll need the image width and the image height. This is because the yolo format is normalized. Check albumentation documentation for a great explanation.

Fast solution

I developped a light library in python called bboxconverter which aims at converting bounding box easily from different formats like coco, yolo or pascal voc. You can check my repo on github for more example, explanations, how-to-guide and tutorials.

You could do the following:

#! pip install bboxconverter #python >= 3.8
from bboxconverter.core.bbox import CWH_BBox, TLWH_BBox

IMAGE_WIDTH = 1920
IMAGE_HEIGHT = 1080

# Create a BBox object
tlwh_bbox = TLWH_BBox(x_min=634,
                      y_min=523,
                      width=103,
                      height=145,
                      image_width=IMAGE_WIDTH,
                      image_height=IMAGE_HEIGHT,
                      class_name='',
                      file_path='')

# Convert to CWH_BBox
cwh_bbox = CWH_BBox.from_TLWH(tlwh_bbox).to_dict()
print(
    f"{cwh_bbox['x_center']}, {cwh_bbox['y_center']}, {cwh_bbox['width']}, {cwh_bbox['height']}"
)

For now, the class_name and file_path must be specified. For your information :

  • CWH means CENTER, WIDTH, HEIGHT
  • TLWH means TOP, LEFT, WIDTH, HEIGHT

Custom Solution

If you still need to implement it yourself for your own purposes. There is a great article that demonstrate how to convert bounding box from different format. Here is how you could convert from coco(tlwh) to yolo(cwh).

def coco_to_yolo(x1, y1, w, h, image_w, image_h):
    return [((2*x1 + w)/(2*image_w)) , ((2*y1 + h)/(2*image_h)), w/image_w, h/image_h]

Upvotes: 0

antonio leblanc
antonio leblanc

Reputation: 1

Convert bbox dictionary into list with relative coordinates

If you want to convert a python dictionary with the keys top, left, widht, height into a list in the format [x1, y1, x2, y2]

Where x1, y1 are the relative coordinates of the top left corner of the bounding box and x2, y2 are the relative coordinates of the bottom right corner of the bounding box you can use the following function :

def bbox_dict_to_list(bbox_dict, image_size):
  h = bbox_dict.get('height')
  l = bbox_dict.get('left')
  t = bbox_dict.get('top')
  w = bbox_dict.get('width')

  img_w, img_h = image_size

  x1 = l/img_w
  y1 = t/img_h
  x2 = (l+w)/img_w
  y2 = (t+h)/img_h
  return [x1, y1, x2, y2]

You must pass as arguments the bbox dictionary, and the image size as a tuple -> (image_width, image_height)

Example

bbox = {"top":634,"left":523,"height":103,"width":145} 
bbox_dict_to_list(bbox, (1280, 720))
>> [0.40859375, 0.8805555555, 0.521875, 1.02361111111]

You can change the return order to suit your needs

Upvotes: 0

Sivaram Rasathurai
Sivaram Rasathurai

Reputation: 6333

Here, For the size you need to pass the (w,h) and the for the box you need to pass (x,x+w, y, y+h) https://github.com/ivder/LabelMeYoloConverter/blob/master/convert.py

def convert(size, box):
    dw = 1./size[0]
    dh = 1./size[1]
    x = (box[0] + box[1])/2.0
    y = (box[2] + box[3])/2.0
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x*dw
    w = w*dw
    y = y*dh
    h = h*dh
    return (x,y,w,h)

Alternatively, you can use below

def convert(x,y,w,h):
 dw = 1.0/w
 dh = 1.0/h
 x = (2*x+w)/2.0
 y = (2*y+w)/2.0
 x = x*dw
 y = y*dh
 w = w*dw
 h = h*dh
 return (x,y,w,h)

Each grid cell predicts B bounding boxes as well as C class probabilities. The bounding box prediction has 5 components: (x, y, w, h, confidence). The (x, y) coordinates represent the center of the box, relative to the grid cell location (remember that, if the center of the box does not fall inside the grid cell, than this cell is not responsible for it). These coordinates are normalized to fall between 0 and 1. The (w, h) box dimensions are also normalized to [0, 1], relative to the image size. Let’s look at an example:

What does the coordinate output of yolo algorithm represent?

Upvotes: 5

Related Questions