Reputation: 25
I'm using lxml
to generate an XML file such as the one below. The documentation and other questions (1, 2) here on Stackoverflow nudged me into the right direction. What I'm struggling with are namespace prefixes such as those in the markList
and mark
nodes.
<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE paula SYSTEM "paula_mark.dtd">
<paula version="1.1">
<header paula_id="Layer_Annotation.0_0000.mark"/>
<markList xmlns:xlink="http://www.w3.org/1999/xlink" type="Annotation" xml:base="text.xml">
<!--foo-->
<mark id="span1" xlink:href="#sTok1"/>
<!--bar-->
<mark id="span2" xlink:href="#sTok2"/>
</markList>
</paula>
This is what I got so far. As you can see from the output below, I'm stuck at the markList
node, and have been banging my head at this for a while now. Any further nudges would be really appreciated.
from lxml import etree
class XMLNamespaces:
xlink = "http://www.w3.org/1999/xlink"
xml = "text.xml"
top = etree.Element("paula", {"version":"1.1"})
header = etree.SubElement(top, "header", {"paula_id": "annotation.mark"})
mark_list = etree.SubElement(top, "markList", {
etree.QName(XMLNamespaces.xlink, "xlink"): "http://www.w3.org/1999/xlink",
"type": "Annotation",
etree.QName(XMLNamespaces.xml, "xml"): "http://www.w3.org/1999/xlink",
})
body = etree.SubElement(top, "body")
body.text = "test body"
print(etree.tounicode(top, pretty_print=True))
Here is my current output:
<paula version="1.1">
<header paula_id="annotation.mark"/>
<markList xmlns:ns0="http://www.w3.org/1999/xlink" xmlns:ns1="text.xml" ns0:xlink="http://www.w3.org/1999/xlink" type="Annotation" ns1:xml="http://www.w3.org/1999/xlink"/>
<body>test body</body>
</paula>
Upvotes: 1
Views: 380
Reputation: 50947
Here is a way to do it:
from lxml import etree
class XMLNamespaces:
xlink = "http://www.w3.org/1999/xlink"
xml = "http://www.w3.org/XML/1998/namespace"
top = etree.Element("paula", {"version": "1.1"})
header = etree.SubElement(top, "header", {"paula_id": "annotation.mark"})
mark_list = etree.SubElement(top, "markList",
{"type": "Annotation",
etree.QName(XMLNamespaces.xml, "base"): "text.xml"},
nsmap={"xlink": XMLNamespaces.xlink})
mark = etree.SubElement(mark_list, "mark",
{"id": "span1",
etree.QName(XMLNamespaces.xlink, "href"): "#sTok1"})
mark = etree.SubElement(mark_list, "mark",
{"id": "span2",
etree.QName(XMLNamespaces.xlink, "href"): "#sTok2"})
print(etree.tounicode(top, pretty_print=True))
Output:
<paula version="1.1">
<header paula_id="annotation.mark"/>
<markList xmlns:xlink="http://www.w3.org/1999/xlink" type="Annotation" xml:base="text.xml">
<mark id="span1" xlink:href="#sTok1"/>
<mark id="span2" xlink:href="#sTok2"/>
</markList>
</paula>
Comments:
nsmap={"xlink": XMLNamespaces.xlink})
on the markList
subelement ensures that xlink
and not ns0
is used in the output.xml
prefix is http://www.w3.org/XML/1998/namespace. This URI is a bit special since it does not need to be present in the XML file, but it must be used in the Python code.Upvotes: 1