Reputation: 781
I have a xml
file which contains cordinates of the bounding boxes of an image, consider the following-
<annotation>
<path>C:\Test_Folder\testfile_1.jpg</path>
<size>
<width>1280</width>
<height>720</height>
</size>
<object>
<name>Bus</name>
<bndbox>
<xmin>316</xmin>
<ymin>232</ymin>
<xmax>413</xmax>
<ymax>403</ymax>
</bndbox>
</object>
<object>
<name>Car</name>
<bndbox>
<xmin>595</xmin>
<ymin>257</ymin>
<xmax>962</xmax>
<ymax>362</ymax>
</bndbox>
</object>
</annotation>
I want to loop through all object tags and extract the object name and coordinates OR as a panda dataframe-
Bus, 316, 232, 413, 403
...
What I could write is the following and do not know how to proceed further.
import xml.etree.ElementTree as ET
path = "C:\Test_Folder\testfile_1.xml"
etree = ET.parse(path)
root = etree.getroot()
for child in root:
# don't know what next.
Upvotes: 0
Views: 317
Reputation: 23815
The below should work
import xml.etree.ElementTree as ET
import pandas as pd
xml = '''
<annotation>
<path>C:\Test_Folder\testfile_1.jpg</path>
<size>
<width>1280</width>
<height>720</height>
</size>
<object>
<name>Bus</name>
<bndbox>
<xmin>316</xmin>
<ymin>232</ymin>
<xmax>413</xmax>
<ymax>403</ymax>
</bndbox>
</object>
<object>
<name>Car</name>
<bndbox>
<xmin>595</xmin>
<ymin>257</ymin>
<xmax>962</xmax>
<ymax>362</ymax>
</bndbox>
</object>
</annotation>'''
data = []
root = ET.fromstring(xml)
for obj in root.findall('.//object'):
data.append({'name': obj.find('name').text,
'xmin': obj.find('.//xmin').text,
'ymin': obj.find('.//ymin').text,
'xmax': obj.find('.//xmax').text,
'ymax': obj.find('.//ymax').text})
df = pd.DataFrame(data)
print(df)
output
name xmin ymin xmax ymax
0 Bus 316 232 413 403
1 Car 595 257 962 362
Upvotes: 3