pkj
pkj

Reputation: 781

Parsing a .xml document

I have a xml file which contains cordinates of the bounding boxes of an image, consider the following-

<annotation>
    <path>C:\Test_Folder\testfile_1.jpg</path>
    <size>
        <width>1280</width>
        <height>720</height>
    </size>
    <object>
        <name>Bus</name>
        <bndbox>
            <xmin>316</xmin>
            <ymin>232</ymin>
            <xmax>413</xmax>
            <ymax>403</ymax>
        </bndbox>
    </object>
    <object>
        <name>Car</name>
        <bndbox>
            <xmin>595</xmin>
            <ymin>257</ymin>
            <xmax>962</xmax>
            <ymax>362</ymax>
        </bndbox>
    </object>
</annotation>

I want to loop through all object tags and extract the object name and coordinates OR as a panda dataframe-

Bus, 316, 232, 413, 403
...

What I could write is the following and do not know how to proceed further.

import xml.etree.ElementTree as ET
path = "C:\Test_Folder\testfile_1.xml"
etree = ET.parse(path) 
root = etree.getroot() 
for child in root:
    # don't know what next.

Upvotes: 0

Views: 317

Answers (1)

balderman
balderman

Reputation: 23815

The below should work

import xml.etree.ElementTree as ET
import pandas as pd

xml = '''
<annotation>
    <path>C:\Test_Folder\testfile_1.jpg</path>
    <size>
        <width>1280</width>
        <height>720</height>
    </size>
    <object>
        <name>Bus</name>
        <bndbox>
            <xmin>316</xmin>
            <ymin>232</ymin>
            <xmax>413</xmax>
            <ymax>403</ymax>
        </bndbox>
    </object>
    <object>
        <name>Car</name>
        <bndbox>
            <xmin>595</xmin>
            <ymin>257</ymin>
            <xmax>962</xmax>
            <ymax>362</ymax>
        </bndbox>
    </object>
</annotation>'''

data = []
root = ET.fromstring(xml)
for obj in root.findall('.//object'):
    data.append({'name': obj.find('name').text,
                 'xmin': obj.find('.//xmin').text,
                 'ymin': obj.find('.//ymin').text,
                 'xmax': obj.find('.//xmax').text,
                 'ymax': obj.find('.//ymax').text})
df = pd.DataFrame(data)
print(df)

output

  name xmin ymin xmax ymax
0  Bus  316  232  413  403
1  Car  595  257  962  362

Upvotes: 3

Related Questions