Ankit
Ankit

Reputation: 203

Convert annotation xml to text in Python

I have a folder having large number of xml files having image annotation data. I want to convert the xml files to text files so that they can be used for YOLO model

I have generated the xml files by labelling the images

<annotation>
    <folder>train</folder>
    <filename>img_1.jpg</filename>
    <path>/home/avnika/images_used_for _project/train/img_1.jpg</path>
    <source>
        <database>Unknown</database>
    </source>
    <size>
        <width>310</width>
        <height>163</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>person</name>
        <pose>Unspecified</pose>
        <truncated>1</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>193</xmin>
            <ymin>40</ymin>
            <xmax>237</xmax>
            <ymax>163</ymax>
        </bndbox>
    </object>
</annotation>

Below is my code so far

from xml.etree.ElementTree import ElementTree
import sys
import os
import glob
from glob import glob

def read_xml(f,op):

    if not os.path.exists(op):
        os.makedirs(op,exist_ok=True)

    file_n = glob(f)
    for i in range(len(file_n)):
        xcontent = ElementTree()
        xcontent.parse(file_n[i])

        doc = [xcontent.find("train").text,xcontent.find("filename").text,xcontent.find("path").text,xcontent.find("width").text,
            xcontent.find("height").text,xcontent.find("depth").text,xcontent.find("name").text,xcontent.find("xmin").text,
            xcontent.find("ymin").text,xcontent.find("xmax").text,xcontent.find("ymax").text]

        out = open(file_n[i]+".txt","w")
        out.write(op)



if __name__ == '__main__':

    files=("C:\\multi_cat_3\\models\\research\\object_detection\\images\\train_xmls\\*")
    op_path=("C:\\multi_cat_3\\models\\research\\object_detection\\images\\train_xmls_op")

    read_xml(files,op_path)

I want to get these values and their attributes in text format. But the code gives me this error as below

Traceback (most recent call last):
  File "C:/Users/128938/PycharmProjects/augmentation_code/test_file.py", line 31, in <module>
    read_xml(files,op_path)
  File "C:/Users/128938/PycharmProjects/augmentation_code/test_file.py", line 17, in read_xml
    doc = [xcontent.find("train").text,xcontent.find("filename").text,xcontent.find("path").text,xcontent.find("width").text,
AttributeError: 'NoneType' object has no attribute 'text'

Upvotes: 1

Views: 1534

Answers (3)

Abdurrahman
Abdurrahman

Reputation: 71

Hope you don't mind replying after few years.

I faced this same issue, and found an online github repository that can do this conversion. Data-annotation

Upvotes: 0

Miladfa7
Miladfa7

Reputation: 412

import os 
xml_label = [x for x in os.walk('../Drone3/label/')]
xml_label = xml_label[0][2]
for xml in xml_label:
    xml_sp = xml.split(".")
    tree = ET.parse("../Drone3/label/"+xml)
    root = tree.getroot()

    xmin = root.find("./object/bndbox/xmin").text
    ymin = root.find("./object/bndbox/ymin").text
    xmax = root.find("./object/bndbox/xmax").text
    ymax = root.find("./object/bndbox/ymax").text

    data = "0" + " " + xmin + " " + ymin+ " " + xmax + " " + ymax

    txt = open('../Drone3/label_txt/'+xml_sp[0]+".txt","w+")
    txt.write(data)

Upvotes: 0

Rajasekar G
Rajasekar G

Reputation: 31

On your Code.,

doc = [xcontent.find("train").text,xcontent.find("filename").text,xcontent.find("path").text,xcontent.find("width").text,
            xcontent.find("height").text,xcontent.find("depth").text,xcontent.find("name").text,xcontent.find("xmin").text,
            xcontent.find("ymin").text,xcontent.find("xmax").text,xcontent.find("ymax").text]

You try to find the train tag but in your XML folder is tag

<annotation>
    <folder>train</folder>
    <filename>img_1.jpg</filename>
    <path>/home/avnika/images_used_for _project/train/img_1.jpg</path>
    <source>

replace this code part, find method try to find the data. if not getting data, it return NoneType.

doc = [xcontent.find("folder").text,xcontent.find("filename").text,xcontent.find("path").text,xcontent.find("width").text,
                xcontent.find("height").text,xcontent.find("depth").text,xcontent.find("name").text,xcontent.find("xmin").text,
                xcontent.find("ymin").text,xcontent.find("xmax").text,xcontent.find("ymax").text]

Refer The ElementTree XML API https://docs.python.org/3/library/xml.etree.elementtree.html#module-xml.etree.ElementTree to get the root element, attribute data, tag text, etc.,

Upvotes: 1

Related Questions