Glaster
Glaster

Reputation: 1

Problem with converting a dataset from LabelImg XML to YOLO txt format: inversion of length and width for some Bounding Boxes

When converting a dataset from LabelImg XML (ImageNet) format to YOLO txt format, some bounding boxes have inverted width and height (i.e. what should be width is height and what should be height is width).

To convert the data I use the following function

def get_classes():
    """Reads list of classes from txt file"""
    classes = []
    with open('classes.txt') as f:
        rows = f.readlines()
        for i in rows:
            current = i
            current = current.replace('\n', '')
            classes.append(current)
    return classes

def convert(size, box):
    """Converts single robndbox"""
    dw = 1. / (size[0])
    dh = 1. / (size[1])
    x = box[0] * dw
    y = box[1] * dh
    w = box[2] * dw
    h = box[3] * dh
    return x, y, w, h


def convert_annotation(annotation_path):
    """Function to convert the XML annotation file to YOLO txt format"""
    in_file = open(annotation_path, encoding='UTF-8')
    out_file = open('./labels/' + os.path.basename(annotation_path)[:-4] + '.txt', 'w')
    tree = ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    classes = get_classes()
    w = int(size.find('width').text)
    h = int(size.find('height').text)
    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in classes or int(difficult) == 1:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('robndbox')
        b = (float(xmlbox.find('cx').text), float(xmlbox.find('cy').text), float(xmlbox.find('w').text),
             float(xmlbox.find('h').text))
        theta = float(xmlbox.find('angle').text)
        theta -= 1.5
        b1, b2, b3, b4 = b
        if b3 < b4:
            b = (b1, b2, b4, b3)
            theta = int(((theta * 180 / math.pi) + 90) % 180)
        else:
            theta = int(theta * 180 / math.pi)
        bb = convert((w, h), b)
        #print(f"{theta} - {cls}")
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + " " + str(theta) + '\n')

What I expected to get as a result:

labelImg2 annotated image radio components xml

What I actually got

enter image description here

In the image, elements such as terminal, transistor.bjt.pnp, resistor.adjustable and some resistors have inverted boudning boxes.

for example, some these elements (transistor and adjustable resistor) have the following markup in xml

<object>
    <name>transistor.bjt.pnp</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <robndbox>
        <cx>1299.0</cx>
        <cy>375.5</cy>
        <w>123.0</w>
        <h>106.0</h>
        <angle>1.570796</angle>
    </robndbox>
    <extra/>
</object>
<object>
    <name>resistor.adjustable</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <robndbox>
        <cx>658.0</cx>
        <cy>479.0</cy>
        <w>122.0</w>
        <h>96.0</h>
        <angle>1.55226</angle>
    </robndbox>
    <extra/>
</object>

How could I solve this problem? Thank you in advance for your help.

Upvotes: 0

Views: 85

Answers (0)

Related Questions