Reputation: 1366
I have this XML string result and i need to get the values in between the tags. But the data type of the XML is string.
final = " <Table><Claimable>false</Claimable><MinorRev>80601</MinorRev><Operation>530600 ION MILL</Operation><HTNum>162</HTNum><WaferEC>80318</WaferEC><HolderType>HACARR</HolderType><Job>167187008</Job></Table>
<Table><Claimable>false</Claimable><MinorRev>71115</MinorRev><Operation>530600 ION MILL</Operation><Experiment>6794</Experiment><HTNum>162</HTNum><WaferEC>71105</WaferEC><HolderType>HACARR</HolderType><Job>16799006</Job></Table> "
This is my code sample
root = ET.fromstring(final)
print root
And this is the error i am receiving :
xml.parsers.expat.ExpatError: The markup in the document following the root element must be well-formed.
Ive tried using ET.fromstring. But with no luck.
Upvotes: 9
Views: 24004
Reputation: 1691
Your XML is malformed. It has to have exactly one top level element. From Wikipedia:
Each XML document has exactly one single root element. It encloses all the other elements and is therefore the sole parent element to all the other elements. ROOT elements are also called PARENT elements.
Try to enclose it within additional tag (e.g. Tables
) and than parse with ET:
xmlData = '''<Tables>
<Table><Claimable>false</Claimable><MinorRev>80601</MinorRev><Operation>530600 ION MILL</Operation><HTNum>162</HTNum><WaferEC>80318</WaferEC><HolderType>HACARR</HolderType><Job>167187008</Job></Table>
<Table><Claimable>false</Claimable><MinorRev>71115</MinorRev><Operation>530600 ION MILL</Operation><Experiment>6794</Experiment><HTNum>162</HTNum><WaferEC>71105</WaferEC><HolderType>HACARR</HolderType><Job>16799006</Job></Table>
</Tables>
'''
import xml.etree.ElementTree as ET
xml = ET.fromstring(xmlData)
for table in xml.getiterator('Table'):
for child in table:
print child.tag, child.text
Since Python 2.7 getiterator('Table')
should be replaced with iter('Table')
:
for table in xml.iter('Table'):
for child in table:
print child.tag, child.text
This produces:
Claimable false
MinorRev 80601
Operation 530600 ION MILL
HTNum 162
WaferEC 80318
HolderType HACARR
Job 167187008
Claimable false
MinorRev 71115
Operation 530600 ION MILL
Experiment 6794
HTNum 162
WaferEC 71105
HolderType HACARR
Job 16799006
Upvotes: 18
Reputation: 3199
Maybe you tried node.attrib
, try node.text
instead to get the string value (also see Parsing XML in the Python docs):
import xml.etree.ElementTree as ET
xml_string = "<Table><Claimable>false</Claimable><MinorRev>80601</MinorRev><Operation>530600 ION MILL</Operation><HTNum>162</HTNum><WaferEC>80318</WaferEC><HolderType>HACARR</HolderType><Job>167187008</Job></Table>"
root = ET.fromstring(xml_string)
for child in root:
print child.tag, child.text
This should give you the
Claimable false
MinorRev 80601
Operation 530600 ION MILL
HTNum 162
WaferEC 80318
HolderType HACARR
Job 167187008
Upvotes: 3