ellaRT
ellaRT

Reputation: 1366

Parse an XML string in Python

I have this XML string result and i need to get the values in between the tags. But the data type of the XML is string.

  final = "  <Table><Claimable>false</Claimable><MinorRev>80601</MinorRev><Operation>530600 ION MILL</Operation><HTNum>162</HTNum><WaferEC>80318</WaferEC><HolderType>HACARR</HolderType><Job>167187008</Job></Table>

    <Table><Claimable>false</Claimable><MinorRev>71115</MinorRev><Operation>530600 ION MILL</Operation><Experiment>6794</Experiment><HTNum>162</HTNum><WaferEC>71105</WaferEC><HolderType>HACARR</HolderType><Job>16799006</Job></Table> "

This is my code sample

root = ET.fromstring(final)
print root

And this is the error i am receiving :

xml.parsers.expat.ExpatError: The markup in the document following the root element must be well-formed.

Ive tried using ET.fromstring. But with no luck.

Upvotes: 9

Views: 24004

Answers (2)

Maciej Lach
Maciej Lach

Reputation: 1691

Your XML is malformed. It has to have exactly one top level element. From Wikipedia:

Each XML document has exactly one single root element. It encloses all the other elements and is therefore the sole parent element to all the other elements. ROOT elements are also called PARENT elements.

Try to enclose it within additional tag (e.g. Tables) and than parse with ET:

xmlData = '''<Tables>
<Table><Claimable>false</Claimable><MinorRev>80601</MinorRev><Operation>530600 ION MILL</Operation><HTNum>162</HTNum><WaferEC>80318</WaferEC><HolderType>HACARR</HolderType><Job>167187008</Job></Table>
<Table><Claimable>false</Claimable><MinorRev>71115</MinorRev><Operation>530600 ION MILL</Operation><Experiment>6794</Experiment><HTNum>162</HTNum><WaferEC>71105</WaferEC><HolderType>HACARR</HolderType><Job>16799006</Job></Table>
</Tables>
'''

import xml.etree.ElementTree as ET
xml = ET.fromstring(xmlData)

for table in xml.getiterator('Table'):
    for child in table:
        print child.tag, child.text

Since Python 2.7 getiterator('Table') should be replaced with iter('Table'):

for table in xml.iter('Table'):
    for child in table:
        print child.tag, child.text

This produces:

Claimable false
MinorRev 80601
Operation 530600 ION MILL
HTNum 162
WaferEC 80318
HolderType HACARR
Job 167187008
Claimable false
MinorRev 71115
Operation 530600 ION MILL
Experiment 6794
HTNum 162
WaferEC 71105
HolderType HACARR
Job 16799006

Upvotes: 18

adrianus
adrianus

Reputation: 3199

Maybe you tried node.attrib, try node.text instead to get the string value (also see Parsing XML in the Python docs):

import xml.etree.ElementTree as ET
xml_string = "<Table><Claimable>false</Claimable><MinorRev>80601</MinorRev><Operation>530600 ION MILL</Operation><HTNum>162</HTNum><WaferEC>80318</WaferEC><HolderType>HACARR</HolderType><Job>167187008</Job></Table>"

root = ET.fromstring(xml_string)

for child in root:
    print child.tag, child.text

This should give you the

Claimable false
MinorRev 80601
Operation 530600 ION MILL
HTNum 162
WaferEC 80318
HolderType HACARR
Job 167187008

Upvotes: 3

Related Questions