Reputation: 15379
Given the following XML structure
<cmap>
<tableVersion version="0" />
<cmap_format_4 platformID="0" platEncID="3" language="0">
<map code="0x20" name="space" />
<!-- SPACE -->
<!--many, many more characters-->
</cmap_format_4>
<cmap_format_0 platformID="1" platEncID="0" language="0">
<map code="0x0" name=".notdef" />
<!--many, many more characters again-->
</cmap_format_0>
<cmap_format_4 platformID="0" platEncID="3" language="0">
<!--"cmap_format_4" again-->
<map code="0x20" name="space" />
<!-- SPACE -->
<!--more "map" nodes-->
</cmap_format_4>
</cmap>
I'm trying to step through and get a list of node names at the cmap_format_0
level along with all the code
and name
nodes beneath them.
cmap_format_4
0x0 null
0xd CR
ox20 space
etc...
cmap_format_0
0x0 notdef
0x1 notdeaf
etc...
So far I have
charactersByFontString = "CODE\tCHAR DESC\n"
tree = ET.parse(xmlFile)
root = tree.getroot()
for map in root.iter("map"):
charactersByFontString += map.attrib["code"] + "\t"
charactersByFontString += map.attrib["name"] + "\n"
That's getting all of my codes and names. however I cannot get the name of the c_format_n
.
for child in root:
print child.child
does not work as tableversion
is its first child, is self-closing, and has no children. (Also I'm not sure if stringing together a bunch of child
nodes even works.) child.sibling
got me an error. How can I get these children in the cmap_format_n
format?
Upvotes: 0
Views: 546
Reputation: 928
import xml.etree.ElementTree as ET
xmlFile = "pytest.xml"
out = "CODE\tCHAR DESC\n"
tree = ET.parse(xmlFile)
root = tree.getroot()
for child in root:
if child.tag[:4] == 'cmap':
out += child.tag + '\n'
for grandchild in child:
out += grandchild.attrib["code"] + '\t'
out += grandchild.attrib["name"] + '\n'
out += '\n'
print(out)
Upvotes: 0
Reputation: 338396
May I suggest XSLT for transforming your input XML?
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text" encoding="UTF-8" indent="yes" />
<xsl:template match="/cmap">
<xsl:apply-templates select="*[starts-with(name(), 'cmap_')]" />
</xsl:template>
<xsl:template match="*[starts-with(name(), 'cmap_')]">
<xsl:value-of select="name()" />
<xsl:text>
</xsl:text>
<xsl:apply-templates select="map" />
<xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="map">
<xsl:apply-templates select="@code" />
<xsl:text>	</xsl:text>
<xsl:apply-templates select="@name" />
<xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
outputs (http://xsltransform.net/bFDb2Cu)
cmap_format_4 0x20 space cmap_format_0 0x0 .notdef cmap_format_4 0x20 space
An example of how to use XSLT from within Python can be found over here.
I'm not saying that it can't be done the way you attempt it (DOM traversal) - it most definitely can - XSLT is just a much more natural fit for the task.
Upvotes: 1