Chris W.
Chris W.

Reputation: 302

Changing an existing namespaced attribute with lxml

I have an existing XML document and I'd like to change a namespaced attribute to another value.

I have this:

<ac:structured-macro ac:name="center">
  <ac:rich-text-body>
    <p>
      some text
    </p>
  </ac:rich-text-body>
</ac:structured-macro>

I would to turn the above into this:

<ac:structured-macro ac:name="new_center">
  <ac:rich-text-body>
    <p>
      some text
    </p>
  </ac:rich-text-body>
</ac:structured-macro>

This python code:

from lxml import etree

pagexml = """<ac:structured-macro ac:name="center"> <ac:rich-text-body> <p> some text </p> </ac:rich-text-body> </ac:structured> -macro>"""
prefix_map = {"ac": "http://www.atlassian.com/schema/confluence/4/ac/",
              "ri": "http://www.atlassian.com/schema/confluence/4/ri/"}
parser = etree.XMLParser(recover=True)
root = etree.fromstring(pagexml, parser)

for action, elem in etree.iterwalk(root, events=("end",)):
    if elem.tag == "ac:structured-macro":
        if elem.get("ac:name") == "center":
            elem.set("{ac}name", "new_center")
print(etree.tostring(root, pretty_print=True, encoding=str))

Produces this:

<ac:structured-macro xmlns:ns0="ac" ac:name="center" ns0:name="new_center">
  <ac:rich-text-body>
    <p>
      some text
    </p>
  </ac:rich-text-body>
</ac:structured-macro>

The <ac:structured-macro> could exist anywhere in the XML tree. I know that I could do this with regexes, but I would prefer to do it the correct way as I think that would be more robust. I hope there's somewhere I can pass the prefix_map and have it honor the ac namespace.

Upvotes: 1

Views: 215

Answers (1)

dabingsou
dabingsou

Reputation: 2469

I am not familiar with lxml. Here is another solution for your reference only.

from simplified_scrapy import SimplifiedDoc

html = '''
<ac:structured-macro ac:name="center">
    <ac:rich-text-body>
    <p>
      some text
    </p>
  </ac:rich-text-body>
</ac:structured-macro>
'''
doc = SimplifiedDoc(html)
structuredMacro = doc.select('ac:structured-macro')
structuredMacro.setAttr('ac:name', 'new_center')
# Or
# structuredMacro.setAttrs({'ac:name': 'new_center'})

print(doc.html)

Result:

<ac:structured-macro ac:name="new_center">
    <ac:rich-text-body>
    <p>
      some text
    </p>
  </ac:rich-text-body>
</ac:structured-macro>

Upvotes: 1

Related Questions