Reputation: 11
I have a csv file where 1 of the columns of information is in XML format. I'd like to be able to parse this information into separate columns and re-save it. I am trying to do this with python, but I am not having much luck. I have looked at similar problems on stack exchange, but I am still having trouble knowing what to do.
Thank you for your help in advanced!
K
Upvotes: 0
Views: 669
Reputation: 9422
ElementTree is a python XML parser ( https://docs.python.org/2/library/xml.etree.elementtree.html )
parse the XML literals in the CSV cells as strings, then iterate through the elements and resave them :
from xml.etree.ElementTree import XML
parsed = XML('''
<root>
<group>
<child id="a">This is child "a".</child>
<child id="b">This is child "b".</child>
</group> // replace this with a variable that contains your XML string literals
<group>
<child id="c">This is child "c".</child>
</group>
</root>
''')
print 'parsed =', parsed
for elem in parsed:
print elem.tag
if elem.text is not None and elem.text.strip():
print ' text: "%s"' % elem.text
if elem.tail is not None and elem.tail.strip():
print ' tail: "%s"' % elem.tail
for name, value in sorted(elem.attrib.items()):
print ' %-4s = "%s"' % (name, value)
print
source :https://pymotw.com/2/xml/etree/ElementTree/parse.html#parsing-strings
alternatively you can convert the XML cells directly :
http://blog.appliedinformaticsinc.com/how-to-parse-and-convert-xml-to-csv-using-python/
Upvotes: 1