Sathish
Sathish

Reputation: 196

How to get the xml element as a string with namespace using ElementTree in python?

I need to get the elements from xml as a string. I am trying with below xml format.

<xml>
    <prot:data xmlns:prot="prot">
        <product-id-template>
            <prot:ProductId>PRODUCT_ID</prot:ProductId>
        </product-id-template>

        <product-name-template>
            <prot:ProductName>PRODUCT_NAME</prot:ProductName>
        </product-name-template>

        <dealer-template>
            <xsi:Dealer xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">DEALER</xsi:Dealer>
        </dealer-template>
    </prot:data>
</xml>

And I tried with below code:

from xml.etree import ElementTree as ET

def get_template(xpath, namespaces):   
    tree = ET.parse('cdata.xml')
    elements = tree.getroot()
    for element in elements.findall(xpath, namespaces=namespaces):
        return element

namespace = {"prot" : "prot"}
aa = get_template(".//prot:ProductId", namespace)
print(ET.tostring(aa).decode())

Actual output:

<ns0:ProductId xmlns:ns0="prot">PRODUCT_ID</ns0:ProductId>

Expected output:

<prot:ProductId>PRODUCT_ID</prot:ProductId>

I should not remove the xmlns from the document where it presents in the document. And It has to be removed where it not presents. Example product-id-template is not containing the xmlns so it needs to be retrieved without xmlns. And dealer-template contains the xmlns so it needs to be retrieved with xmlns.

How to achieve this?

Upvotes: 2

Views: 1024

Answers (1)

qwermike
qwermike

Reputation: 1486

You can remove xmlns with regex.

import re
# ...
with_ns = ET.tostring(aa).decode()
no_ns = re.sub(' xmlns(:\w+)?="[^"]+"', '', with_ns)
print(no_ns)

UPDATE: You can do a very wild thing. Although I can't recommend it, because I'm not a Python expert.

I just checked the source code and found that I can do this hack:

def my_serialize_xml(write, elem, qnames, namespaces,
                     short_empty_elements, **kwargs):
    ET._serialize_xml(write, elem, qnames,
                      None, short_empty_elements, **kwargs)

ET._serialize["xml"] = my_serialize_xml

I just defined my_serialize_xml, which calls ElementTree._serialize_xml with namespaces=None. And then, in dictionary ElementTree._serialize, I changed value for key "xml" to my_serialize_xml. So when you call ElementTree.tostring, it will use my_serialize_xml.

If you want to try it, just place the code(above) after from xml.etree import ElementTree as ET (but before using the ET).

Upvotes: 1

Related Questions