Tristan Tran
Tristan Tran

Reputation: 1513

Does ElementTree generate its own nsmap while lxml.etree does not?

This thread explains the nature of nsmap in an lxml.etree. Given the following XML and parsing code, I try ElementTree.dump and etree.dump to view. The display from ET shows various namespaces ns1, ns2, etc. Does this mean the ET actually generate internal namespace? if so, can we or how do we use it, for example, to search for an element whose name we know, but not its URI?

my_xml.xml

<?xml version="1.0" encoding="UTF-8"?>
<data>
  <country name="Liechtenstein" xmlns="aaa:bbb:ccc:liechtenstein:eee">
    <rank updated="yes">2</rank>
    <holidays>
      <christmas>Yes</christmas>
    </holidays>
    <year>2008</year>
    <gdppc>141100</gdppc>
    <neighbor name="Austria" direction="E"/>
    <neighbor name="Switzerland" direction="W"/>
  </country>
  <country name="Singapore" xmlns="aaa:bbb:ccc:singapore:eee">
    <continent>Asia</continent>
    <holidays>
      <christmas>Yes</christmas>
    </holidays>
    <rank updated="yes">5</rank>
    <year>2011</year>
    <gdppc>59900</gdppc>
    <neighbor name="Malaysia" direction="N"/>
  </country>
  <country name="Panama" xmlns="aaa:bbb:ccc:panama:eee">
    <rank updated="yes">69</rank>
    <year>2011</year>
    <gdppc>13600</gdppc>
    <neighbor name="Costa Rica" direction="W"/>
    <neighbor name="Colombia" direction="E"/>
  </country>
</data>

parsing.py

import lxml.etree as etree

tree = etree.parse('my_xml.xml')
root = tree.getroot()

ET.dump

<ns0:data xmlns:ns0="aaa:bbb:ccc:ddd:eee" xmlns:ns1="aaa:bbb:ccc:liechtenstein:eee" xmlns:ns2="aaa:bbb:ccc:singapore:eee" xmlns:ns3="aaa:bbb:ccc:panama:eee">
  <ns1:country name="Liechtenstein">
    <ns1:rank updated="yes">2</ns1:rank>
    <ns1:year>2008</ns1:year>
    <ns1:gdppc>141100</ns1:gdppc>
    <ns1:neighbor name="Austria" direction="E" />
    <ns1:neighbor name="Switzerland" direction="W" />
  </ns1:country>
  <ns2:country name="Singapore">
    <ns2:continent>Asia</ns2:continent>
    <ns2:holidays>
      <ns2:christmas>Yes</ns2:christmas>
    </ns2:holidays>
    <ns2:rank updated="yes">5</ns2:rank>
    <ns2:year>2011</ns2:year>
    <ns2:gdppc>59900</ns2:gdppc>
    <ns2:neighbor name="Malaysia" direction="N" />
  </ns2:country>
  <ns3:country name="Panama">
    <ns3:rank updated="yes">69</ns3:rank>
    <ns3:year>2011</ns3:year>
    <ns3:gdppc>13600</ns3:gdppc>
    <ns3:neighbor name="Costa Rica" direction="W" />
    <ns3:neighbor name="Colombia" direction="E" />
  </ns3:country>
</ns0:data>

etree.dump

<data xmlns="aaa:bbb:ccc:ddd:eee">
  <country xmlns="aaa:bbb:ccc:liechtenstein:eee" name="Liechtenstein">
    <rank updated="yes">2</rank>
    <year>2008</year>
    <gdppc>141100</gdppc>
    <neighbor name="Austria" direction="E"/>
    <neighbor name="Switzerland" direction="W"/>
  </country>
  <country xmlns="aaa:bbb:ccc:singapore:eee" name="Singapore">
    <continent>Asia</continent>
    <holidays>
      <christmas>Yes</christmas>
    </holidays>
    <rank updated="yes">5</rank>
    <year>2011</year>
    <gdppc>59900</gdppc>
    <neighbor name="Malaysia" direction="N"/>
  </country>
  <country xmlns="aaa:bbb:ccc:panama:eee" name="Panama">
    <rank updated="yes">69</rank>
    <year>2011</year>
    <gdppc>13600</gdppc>
    <neighbor name="Costa Rica" direction="W"/>
    <neighbor name="Colombia" direction="E"/>
  </country>
</data>

Upvotes: 2

Views: 842

Answers (1)

Mathias M&#252;ller
Mathias M&#252;ller

Reputation: 22637

ElementTree (ET) and lxml differ in the way they handle namespaces

ET generally does not preserve the namespace prefixes defined in an input document and does not store whether a namespace was a default namespace. ET also moves all namespace declarations to the outermost element.

lxml generally preserves prefixes and default namespaces.

For instance, if your input document is:

<root xmlns:myprefix="https://www.myuri.com">

or if the input has a default namespace:

<root xmlns="https://www.myuri.com">

then lxml would preserve everything about this document, while ET could turn it into

<root xmlns:ns0="https://www.myuri.com">

The modifications by ET do not change the semantics of the document, but if you would like to prevent this, registering all namespaces before parsing an XML file is the only solution I believe.

ET namespace mapping not superior or more useful

Does this mean the ET actually generate internal namespace? if so, can we or how do we use it, for example, to search for an element whose name we know, but not its URI?

No, I do not think the ET namespace mapping is any more (or less) useful than the one you get from lxml. In both cases, essentially the same namespaces are defined and the same mapping is used.

As I have mentioned earlier, it is uncommon to know the name of an element but not its namespace URI. If the namespace URI of elements is unpredictable and not known a priori, this is a reason to

  • change your XML format (if you yourself are in charge)
  • complain about this type of document and insist that it be changed (if someone else is in charge)

In any case, in the XML document you are showing namespaces are certainly not used in the intended way. There is no need for each country element to reside in its own namespace.

Upvotes: 2

Related Questions