amphibient
amphibient

Reputation: 31278

How to replace/remove XML tag with BeautifulSoup?

I have XML in a local file that is a template for a final message that gets POSTed to a REST service. The script pre processes the template data before it gets posted.

So the template looks something like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root>
    <singleElement>
        <subElementX>XYZ</subElementX>
    </singleElement>
    <repeatingElement id="11" name="Joe"/>
    <repeatingElement id="12" name="Mary"/>
</root>

The message XML should look the same except that the repeatingElement tags need to be replaced with something else (XML generated by the script based on the attributes in the existing tag).

Here is my script so far:

xmlData = None

with open('conf//test1.xml', 'r') as xmlFile:
    xmlData = xmlFile.read()

xmlSoup = BeautifulSoup(xmlData, 'html.parser')

repElemList = xmlSoup.find_all('repeatingelement')

for repElem in repElemList:
    print("Processing repElem...")
    repElemID = repElem.get('id')
    repElemName = repElem.get('name')

    # now I do something with repElemID and repElemName
    # and no longer need it. I would like to replace it with <somenewtag/>
    # and dump what is in the soup object back into a string.
    # is it possible with BeautifulSoup?

Can I replace the repeating elements with something else and then dump the soup object into a new string that I can post to my REST API?

NOTE: I am using html.parser because I can't get the xml parser to work but it works alright, understanding HTML is more permissive than XML parsing.

Upvotes: 2

Views: 3545

Answers (1)

alecxe
alecxe

Reputation: 474061

You can use .replace_with() and .new_tag() methods:

for repElem in repElemList:
    print("Processing repElem...")
    repElemID = repElem.get('id')
    repElemName = repElem.get('name')

    repElem.replace_with(xmlSoup.new_tag("somenewtag"))

Then, you can dump the "soup" using str(soup) or soup.prettify().

Upvotes: 3

Related Questions