Reputation: 22952
Is there a way, using lxml, to insert XML attributes with the right namespace?
For instance, I want to use XLink to insert links in a XML document. All I need to do is to insert {http://www.w3.org/1999/xlink}href
attributes in some elements. I would like to use xlink
prefix, but lxml generates prefixes like "ns0", "ns1"…
Here is what I tried:
from lxml import etree
#: Name (and namespace) of the *href* attribute use to insert links.
HREF_ATTR = etree.QName("http://www.w3.org/1999/xlink", "href").text
content = """\
<body>
<p>Link to <span>StackOverflow</span></p>
<p>Link to <span>Google</span></p>
</body>
"""
targets = ["https://stackoverflow.com", "https://www.google.fr"]
body_elem = etree.XML(content)
for span_elem, target in zip(body_elem.iter("span"), targets):
span_elem.attrib[HREF_ATTR] = target
etree.dump(body_elem)
The dump looks like this:
<body>
<p>link to <span xmlns:ns0="http://www.w3.org/1999/xlink"
ns0:href="https://stackoverflow.com">stackoverflow</span></p>
<p>link to <span xmlns:ns1="http://www.w3.org/1999/xlink"
ns1:href="https://www.google.fr">google</span></p>
</body>
I found a way to factorize the namespaces by inserting and deleting an attribute in the root element, like this:
# trick to declare the XLink namespace globally (only one time).
body_elem = etree.XML(content)
body_elem.attrib[HREF_ATTR] = ""
del body_elem.attrib[HREF_ATTR]
targets = ["https://stackoverflow.com", "https://www.google.fr"]
for span_elem, target in zip(body_elem.iter("span"), targets):
span_elem.attrib[HREF_ATTR] = target
etree.dump(body_elem)
It's ugly, but it works and I only need to do it one time. I get:
<body xmlns:ns0="http://www.w3.org/1999/xlink">
<p>Link to <span ns0:href="https://stackoverflow.com">StackOverflow</span></p>
<p>Link to <span ns0:href="https://www.google.fr">Google</span></p>
</body>
But the problem remains: how can I turn this "ns0" prefix into "xlink"?
Upvotes: 1
Views: 698
Reputation: 22952
Using register_namespace
as suggested by @mzjn:
etree.register_namespace("xlink", "http://www.w3.org/1999/xlink")
# trick to declare the XLink namespace globally (only one time).
body_elem = etree.XML(content)
body_elem.attrib[HREF_ATTR] = ""
del body_elem.attrib[HREF_ATTR]
targets = ["https://stackoverflow.com", "https://www.google.fr"]
for span_elem, target in zip(body_elem.iter("span"), targets):
span_elem.attrib[HREF_ATTR] = target
etree.dump(body_elem)
The result is what I expected:
<body xmlns:xlink="http://www.w3.org/1999/xlink">
<p>Link to <span xlink:href="https://stackoverflow.com">StackOverflow</span></p>
<p>Link to <span xlink:href="https://www.google.fr">Google</span></p>
</body>
Upvotes: 2