Reputation: 203
I have a folder having large number of xml files having image annotation data. I want to convert the xml files to text files so that they can be used for YOLO model
I have generated the xml files by labelling the images
<annotation>
<folder>train</folder>
<filename>img_1.jpg</filename>
<path>/home/avnika/images_used_for _project/train/img_1.jpg</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>310</width>
<height>163</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>person</name>
<pose>Unspecified</pose>
<truncated>1</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>193</xmin>
<ymin>40</ymin>
<xmax>237</xmax>
<ymax>163</ymax>
</bndbox>
</object>
</annotation>
Below is my code so far
from xml.etree.ElementTree import ElementTree
import sys
import os
import glob
from glob import glob
def read_xml(f,op):
if not os.path.exists(op):
os.makedirs(op,exist_ok=True)
file_n = glob(f)
for i in range(len(file_n)):
xcontent = ElementTree()
xcontent.parse(file_n[i])
doc = [xcontent.find("train").text,xcontent.find("filename").text,xcontent.find("path").text,xcontent.find("width").text,
xcontent.find("height").text,xcontent.find("depth").text,xcontent.find("name").text,xcontent.find("xmin").text,
xcontent.find("ymin").text,xcontent.find("xmax").text,xcontent.find("ymax").text]
out = open(file_n[i]+".txt","w")
out.write(op)
if __name__ == '__main__':
files=("C:\\multi_cat_3\\models\\research\\object_detection\\images\\train_xmls\\*")
op_path=("C:\\multi_cat_3\\models\\research\\object_detection\\images\\train_xmls_op")
read_xml(files,op_path)
I want to get these values and their attributes in text format. But the code gives me this error as below
Traceback (most recent call last):
File "C:/Users/128938/PycharmProjects/augmentation_code/test_file.py", line 31, in <module>
read_xml(files,op_path)
File "C:/Users/128938/PycharmProjects/augmentation_code/test_file.py", line 17, in read_xml
doc = [xcontent.find("train").text,xcontent.find("filename").text,xcontent.find("path").text,xcontent.find("width").text,
AttributeError: 'NoneType' object has no attribute 'text'
Upvotes: 1
Views: 1534
Reputation: 71
Hope you don't mind replying after few years.
I faced this same issue, and found an online github repository that can do this conversion. Data-annotation
Upvotes: 0
Reputation: 412
import os
xml_label = [x for x in os.walk('../Drone3/label/')]
xml_label = xml_label[0][2]
for xml in xml_label:
xml_sp = xml.split(".")
tree = ET.parse("../Drone3/label/"+xml)
root = tree.getroot()
xmin = root.find("./object/bndbox/xmin").text
ymin = root.find("./object/bndbox/ymin").text
xmax = root.find("./object/bndbox/xmax").text
ymax = root.find("./object/bndbox/ymax").text
data = "0" + " " + xmin + " " + ymin+ " " + xmax + " " + ymax
txt = open('../Drone3/label_txt/'+xml_sp[0]+".txt","w+")
txt.write(data)
Upvotes: 0
Reputation: 31
On your Code.,
doc = [xcontent.find("train").text,xcontent.find("filename").text,xcontent.find("path").text,xcontent.find("width").text,
xcontent.find("height").text,xcontent.find("depth").text,xcontent.find("name").text,xcontent.find("xmin").text,
xcontent.find("ymin").text,xcontent.find("xmax").text,xcontent.find("ymax").text]
You try to find the train tag but in your XML folder is tag
<annotation>
<folder>train</folder>
<filename>img_1.jpg</filename>
<path>/home/avnika/images_used_for _project/train/img_1.jpg</path>
<source>
replace this code part, find method try to find the data. if not getting data, it return NoneType.
doc = [xcontent.find("folder").text,xcontent.find("filename").text,xcontent.find("path").text,xcontent.find("width").text,
xcontent.find("height").text,xcontent.find("depth").text,xcontent.find("name").text,xcontent.find("xmin").text,
xcontent.find("ymin").text,xcontent.find("xmax").text,xcontent.find("ymax").text]
Refer The ElementTree XML API https://docs.python.org/3/library/xml.etree.elementtree.html#module-xml.etree.ElementTree to get the root element, attribute data, tag text, etc.,
Upvotes: 1