1252748
1252748

Reputation: 15379

Get children node names from one parent

Given the following XML structure

<cmap>
   <tableVersion version="0" />
   <cmap_format_4 platformID="0" platEncID="3" language="0">
      <map code="0x20" name="space" />
      <!-- SPACE -->
      <!--many, many more characters-->
   </cmap_format_4>
   <cmap_format_0 platformID="1" platEncID="0" language="0">
      <map code="0x0" name=".notdef" />
      <!--many, many more characters again-->
   </cmap_format_0>
   <cmap_format_4 platformID="0" platEncID="3" language="0">
      <!--"cmap_format_4" again-->
      <map code="0x20" name="space" />
      <!-- SPACE -->
      <!--more "map" nodes-->
   </cmap_format_4>
</cmap>

I'm trying to step through and get a list of node names at the cmap_format_0 level along with all the code and name nodes beneath them.

Expected outcome

cmap_format_4
0x0   null
0xd   CR
ox20  space
etc...

cmap_format_0
0x0   notdef
0x1   notdeaf
etc...

So far I have

charactersByFontString = "CODE\tCHAR DESC\n"
tree = ET.parse(xmlFile)
root = tree.getroot()

for map in root.iter("map"):
    charactersByFontString += map.attrib["code"] + "\t"
    charactersByFontString += map.attrib["name"] + "\n"

That's getting all of my codes and names. however I cannot get the name of the c_format_n.

for child in root:
    print child.child

does not work as tableversion is its first child, is self-closing, and has no children. (Also I'm not sure if stringing together a bunch of child nodes even works.) child.sibling got me an error. How can I get these children in the cmap_format_n format?

Upvotes: 0

Views: 546

Answers (2)

bjimba
bjimba

Reputation: 928

import xml.etree.ElementTree as ET

xmlFile = "pytest.xml"

out = "CODE\tCHAR DESC\n"

tree = ET.parse(xmlFile)
root = tree.getroot()

for child in root:
    if child.tag[:4] == 'cmap':
        out += child.tag + '\n'
        for grandchild in child:
            out += grandchild.attrib["code"] + '\t'
            out += grandchild.attrib["name"] + '\n'
        out += '\n'

print(out)

Upvotes: 0

Tomalak
Tomalak

Reputation: 338396

May I suggest XSLT for transforming your input XML?

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="text" encoding="UTF-8" indent="yes" />

    <xsl:template match="/cmap">
        <xsl:apply-templates select="*[starts-with(name(), 'cmap_')]" />
    </xsl:template>

    <xsl:template match="*[starts-with(name(), 'cmap_')]">
        <xsl:value-of select="name()" />
        <xsl:text>&#xA;</xsl:text>
        <xsl:apply-templates select="map" />
        <xsl:text>&#xA;</xsl:text>
    </xsl:template>

    <xsl:template match="map">
        <xsl:apply-templates select="@code" />
        <xsl:text>&#x9;</xsl:text>
        <xsl:apply-templates select="@name" />
        <xsl:text>&#xA;</xsl:text>
    </xsl:template>
</xsl:stylesheet>

outputs (http://xsltransform.net/bFDb2Cu)

cmap_format_4
0x20    space

cmap_format_0
0x0 .notdef

cmap_format_4
0x20    space

An example of how to use XSLT from within Python can be found over here.

I'm not saying that it can't be done the way you attempt it (DOM traversal) - it most definitely can - XSLT is just a much more natural fit for the task.

Upvotes: 1

Related Questions