ambaraba
ambaraba

Reputation: 25

How to add different namespaces to node attributes in lxml

I'm using lxml to generate an XML file such as the one below. The documentation and other questions (1, 2) here on Stackoverflow nudged me into the right direction. What I'm struggling with are namespace prefixes such as those in the markList and mark nodes.

<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE paula SYSTEM "paula_mark.dtd">
<paula version="1.1">
    <header paula_id="Layer_Annotation.0_0000.mark"/>
    <markList xmlns:xlink="http://www.w3.org/1999/xlink" type="Annotation" xml:base="text.xml">
        <!--foo-->
        <mark id="span1" xlink:href="#sTok1"/>
        <!--bar-->
        <mark id="span2" xlink:href="#sTok2"/>
    </markList>
</paula>

This is what I got so far. As you can see from the output below, I'm stuck at the markList node, and have been banging my head at this for a while now. Any further nudges would be really appreciated.

from lxml import etree

class XMLNamespaces:
   xlink = "http://www.w3.org/1999/xlink"
   xml = "text.xml"

top = etree.Element("paula", {"version":"1.1"})
header = etree.SubElement(top, "header", {"paula_id": "annotation.mark"})
mark_list = etree.SubElement(top, "markList", {
    etree.QName(XMLNamespaces.xlink, "xlink"): "http://www.w3.org/1999/xlink",
    "type": "Annotation",
    etree.QName(XMLNamespaces.xml, "xml"): "http://www.w3.org/1999/xlink",
})

body = etree.SubElement(top, "body")    
body.text = "test body"
    
print(etree.tounicode(top, pretty_print=True))

Here is my current output:

<paula version="1.1">
  <header paula_id="annotation.mark"/>
  <markList xmlns:ns0="http://www.w3.org/1999/xlink" xmlns:ns1="text.xml" ns0:xlink="http://www.w3.org/1999/xlink" type="Annotation" ns1:xml="http://www.w3.org/1999/xlink"/>
  <body>test body</body>
</paula>

Upvotes: 1

Views: 380

Answers (1)

mzjn
mzjn

Reputation: 50947

Here is a way to do it:

from lxml import etree
 
class XMLNamespaces:
   xlink = "http://www.w3.org/1999/xlink"
   xml = "http://www.w3.org/XML/1998/namespace"
 
top = etree.Element("paula", {"version": "1.1"})
 
header = etree.SubElement(top, "header", {"paula_id": "annotation.mark"})
 
mark_list = etree.SubElement(top, "markList",
                             {"type": "Annotation",
                              etree.QName(XMLNamespaces.xml, "base"): "text.xml"},
                              nsmap={"xlink": XMLNamespaces.xlink})
 
mark = etree.SubElement(mark_list, "mark",
                        {"id": "span1",  
                         etree.QName(XMLNamespaces.xlink, "href"): "#sTok1"})
 
mark = etree.SubElement(mark_list, "mark",
                        {"id": "span2",  
                         etree.QName(XMLNamespaces.xlink, "href"): "#sTok2"})
    
print(etree.tounicode(top, pretty_print=True))

Output:

<paula version="1.1">
  <header paula_id="annotation.mark"/>
  <markList xmlns:xlink="http://www.w3.org/1999/xlink" type="Annotation" xml:base="text.xml">
    <mark id="span1" xlink:href="#sTok1"/>
    <mark id="span2" xlink:href="#sTok2"/>
  </markList>
</paula>

Comments:

  • nsmap={"xlink": XMLNamespaces.xlink}) on the markList subelement ensures that xlink and not ns0 is used in the output.
  • The URI for the xml prefix is http://www.w3.org/XML/1998/namespace. This URI is a bit special since it does not need to be present in the XML file, but it must be used in the Python code.

Upvotes: 1

Related Questions