how to convert coordination of labels for yolo when cropping image?

Question

i've created over 1200 images with labels for yolo detection and the problem is every image size is 800x600 and all the objects with labels are in the middle of the image. so i wanna crop the rest of the part since objects are placed in the middle. so the size of images would be something like 400x300 (crop left, right, top, bottom equally) but the objects will still be in the middle. but how do you convert or change the coordinates other than labeling all over again?

# (used labelimg for yolo)
0 0.545000 0.722500 0.042500 0.091667
1 0.518750 0.762500 0.097500 0.271667

heres one of my label .txt. sorry for my bad english!

Bindestrich · Accepted Answer

I was just working this out myself, so here is a complete explanation of why the formula at the bottom is correct.

let's go over how these Annotations are formatted.

         x
 0--------------->1    
 |       .
 |   _________   
 |   |   .   | ^
 |   |   .   | |
y|...|...*   | h
 |   |       | |
 |   |_______| v
 |   <---w--->
 V 
 1

Each line is 5 numbers sperated by a space: n x y w h with

n number of your class e.g. 0:"tree",1:"car" etc.
x the x normalized coordinate of the center of your marked area
y the x normalized coordinate of the center of your marked area
w the h normalized width of the marked area
h the h normalized height of the marked area

W and H mean the width and height of the original image. A normalized value is relative to the width or height of the image.Not in pixels or other unit. It a proportion. For example the x value is normalized like this x[px]/W[px] = x normalized.

a few advantages of this:

all values are in the range of 0 to 1. It is easy to tell if a value is out of frame <0 or >1.
does not matter whether you upscale or downscale the image
unit of measurement is irrelevant.

The y axes goes from top to bottom. everything else is like your standard coordinate system.

Now to cropping. let's take this picture of a tree:

      W
   0------>1
   |⠀⢀⣴⣶⣤⣄⠀| 
   |⢠⣿⣿⣿⣿⣿⡆|
H  |⠈⠿⠿⣯⠿⠿⠁|
   | ⠀⠀⣿⠀  |⠀⠀
   v  ⠐⠛⠃⠀ |⠀
   1--------

scaling

We will now crop to the top left quarter of the tree image.

 _____
 | ⣴⣶|  
 |⢠⣿⣿|
 -----

our new image width W' is now only half of the original W. also H'= 0.5*H. The center of the old image is now the bottom left corner. We know the center of the image p is at (0.5,0.5). The bottom left corner is at p' =(1,1). If we would crop so (0.3,0.3) in the old image is the new bottom richt the new coordinate would also be at (1,1). 0.5 is also ½ . To get from 0.5 to 1 we need to multiply by 2, for ⅓ *3 , ¼ *4 . We see that if we reduce the the width or height by a/b be need to multiply by b/a.

translation

But we also want to move the top left of the image, our coordinate origin O. Lets crop to the tree trunk:

   O'---
H' |⠀⣿⠀|⠀⠀
   |⠐⠛⠃|
   ----q'
     W'

W is 7 characters. the new width is W' is 3. H=5 and H' is 2. The new origin O is (0,0) of course and O' is at (2,3) in characters, normalized to the original image ([![2 over 7][2]][2], [![3 over 5][3]][3]) or (0.285,0.6). O' is (0.285,0.6) but should be (0,0) so we reduce by x and y by 0.285 and 0.6 respectively before we scale the new value. This is not very interesting because 0 times anything is 0.

Let's do another example. the bottom right of our new cropped image of the tree trunk. Let's call this point q we know that q in our new system of the cropped image must be q' =(1,1) , it's the bottom right after all.

We already measured: W=7 W'=3 H=5 H'=2
By how much did we reduce height and width as a proportion?

(W-W'/W) is (7-3/7) is (4/7) or 0.571 . We know we have to scale W by 7/4 or 1.75 or 0.571^-1 . For H : 3/5 -> 5/3 -> 1.6 repeating. lets call these scaling factors s_h =5/3 and s_w=7/4

q' is at (5,7) in O . lets put our formula to the test. we moved hour origin by 2 in x/w and 3 in y/h direction lets call this Δw=2 and Δh=3.

For q'_x we remove 2 from q_x because Δw=2. we get 5-2=3. now we normalize 3 by dividing by 5. so we get q_x is 3/5. now we scale by s_h= 5/3 and yes 5/3 times 3/5 is indeed 1. Now that we checked our logic we can write an algorithm.

The algorithm

We already have normalized values so the matter is simpler.

For a point p in the original we can calculate p' in the new image like this:

p`= (x',y')=((x -Δw)* s_w),(y -Δh)* s_h) with: Δw = abs(W-W'),Δh = abs(H-H') , s_w= W/Δw , s_h= H/Δh h'= h * s_h w'= w * s_w

in python:

    def transpose_annot(x_c, y_c, w_c,h_c,annnotations):
        # c : cropped area
    
        # s_w scale width
        s_w = 1/w_c
        # s_w scale height
        s_h = 1/h_c
        new_annots=list()
    
        for annot in annnotations:
      
            try:
                n,x, y, w, h = annot # check if n/label is given
            except Exception:
                x, y, w, h = annot
            w_ = w*s_w
            h_ = h*s_h
            delta_x= x-x_c
            delta_y=y-y_c
            # center of cropping area is new center of image
            # we just scale the image accordingly
            x_ = 0.5 + delta_x * s_w
            y_ = 0.5 + delta_y * s_h
            if n==None:
                new_annots.append((x_, y_, w_, h_))
            else:
                new_annots.append((n,x_, y_, w_, h_))
            print(x_, y_, w_, h_)
        return new_annots

correcting annotations

We could crop out annotations that we need to drop, or adjust to being partially cropped out.

As mentioned before all values must be in the interval [0,1].

Completely cropped out annotations will have 1+Δw/2>x<Δw/2 and 1+Δw/2>y<Δh/2

partially cropped

if you want to include annotations with only 1/4 or less area visible or drop annotations in the range [0,25,1) it will be more complicated.

         x
     _________   
     |   .   |
     |   .   |
 y...|.0-*---|-------->1
     | |     | h
     |_______|
       |  w
       V 
       1

intersection area in cropped image

we can view this problem as calculating the intersection area between two rectangles. For convenience the function also returns the percentage of area in frame.

def new_annotation_area(x, y, w, h):

    # ________
    # |  a   |
    # |   ___|______
    # |   |c |     |
    # |___|__|  b  |
    #     |________|
    # a is coordinate system (given)
    # b is the annotation in coordinate system
    # c is the intersection area
    a_x = 0.5
    a_y = 0.5
    a_w = 1
    a_h = 1
    a_max_x = a_x + a_w / 2
    a_min_x = a_x - a_w / 2

    b_max_x = x + w / 2
    b_min_x = x - w / 2

    # from the one dimensional case
    # how much do two lines overlap/intersect?
    # it is easy to get to the area
    #  a_min_x----------a_max_X
    #        b_min_X----------b_max_x
    #        c_min_x----c_max_x

    c_min_x = max(a_min_x, b_min_x)
    c_max_x = min(a_max_x, b_max_x)
    c_len_x = c_max_x - c_min_x

    a_max_y = a_y + a_h / 2
    a_min_y = a_y - a_h / 2

    b_max_y = y + h / 2
    b_min_y = y - h / 2

    c_min_y = max(a_min_y, b_min_y)
    c_max_y = min(a_max_y, b_max_y)
    c_len_y = c_max_y - c_min_y
    area = c_len_y * c_len_x

    c_w = c_len_x
    c_h = c_len_y
    c_x = c_min_x + 0.5 * c_w
    c_y = c_min_y + 0.5 * c_h

    return area/(w*h), (c_x, c_y, c_w, c_h)