How to add different namespaces to node attributes in lxml

Question

I'm using lxml to generate an XML file such as the one below. The documentation and other questions (1, 2) here on Stackoverflow nudged me into the right direction. What I'm struggling with are namespace prefixes such as those in the markList and mark nodes.

This is what I got so far. As you can see from the output below, I'm stuck at the markList node, and have been banging my head at this for a while now. Any further nudges would be really appreciated.

from lxml import etree

class XMLNamespaces:
   xlink = "http://www.w3.org/1999/xlink"
   xml = "text.xml"

top = etree.Element("paula", {"version":"1.1"})
header = etree.SubElement(top, "header", {"paula_id": "annotation.mark"})
mark_list = etree.SubElement(top, "markList", {
    etree.QName(XMLNamespaces.xlink, "xlink"): "http://www.w3.org/1999/xlink",
    "type": "Annotation",
    etree.QName(XMLNamespaces.xml, "xml"): "http://www.w3.org/1999/xlink",
})

body = etree.SubElement(top, "body")    
body.text = "test body"
    
print(etree.tounicode(top, pretty_print=True))

Here is my current output:


  
  
  test body

mzjn · Accepted Answer

Here is a way to do it:

from lxml import etree
 
class XMLNamespaces:
   xlink = "http://www.w3.org/1999/xlink"
   xml = "http://www.w3.org/XML/1998/namespace"
 
top = etree.Element("paula", {"version": "1.1"})
 
header = etree.SubElement(top, "header", {"paula_id": "annotation.mark"})
 
mark_list = etree.SubElement(top, "markList",
                             {"type": "Annotation",
                              etree.QName(XMLNamespaces.xml, "base"): "text.xml"},
                              nsmap={"xlink": XMLNamespaces.xlink})
 
mark = etree.SubElement(mark_list, "mark",
                        {"id": "span1",  
                         etree.QName(XMLNamespaces.xlink, "href"): "#sTok1"})
 
mark = etree.SubElement(mark_list, "mark",
                        {"id": "span2",  
                         etree.QName(XMLNamespaces.xlink, "href"): "#sTok2"})
    
print(etree.tounicode(top, pretty_print=True))

Output:

Comments:

nsmap={"xlink": XMLNamespaces.xlink}) on the markList subelement ensures that xlink and not ns0 is used in the output.
The URI for the xml prefix is http://www.w3.org/XML/1998/namespace. This URI is a bit special since it does not need to be present in the XML file, but it must be used in the Python code.

How to add different namespaces to node attributes in lxml

Answers (1)

Related Questions