Yeahprettymuch
Yeahprettymuch

Reputation: 541

Using BeautifulSoup to populate (and identify) empty xml tags

Populating empty XML tags is not something that I can seem to readily find a usable solution for.

Let's say we receive an XML snippet like the following, containing the info of a customer:

<TransactionDetails>
    <Name>Jamie Silver</Name>
    <CustomerID>1234567</CustomerID>
    <StaffID>9876543</StaffID>
</TransactionDetails>

Sometimes, the snippet we receive might not have the StaffID of whoever served them. In this case, the XML snippet shows the StaffID tag very differently:

<TransactionDetails>
    <Name>Jamie Silver</Name>
    <CustomerID>1234567</CustomerID>
    <StaffID/>
</TransactionDetails>

So what happens when the StaffID is missing is that the <StaffID></StaffID> gets truncated to just <StaffID/>, where the forward slash is moved to the back.

What I'm trying to do is to insert a filled value into the XML file using BeautifulSoup, but it also needs to fix the incorrect truncation (so that <StaffID/> is turned back into <StaffID></StaffID> first.

Upvotes: 0

Views: 462

Answers (1)

facelessuser
facelessuser

Reputation: 1734

CSS selectors are usually used with HTML, but many work just fine with XML. Since you are using XML, we will use the lxml-xml parser. And we will use the :empty selector. As long as the element has no children and only contains whitespace, this will work for us. This is using the css-selector-4 definition of :empty: https://drafts.csswg.org/selectors-4/#the-empty-pseudo.

The below example targets StaffID that are empty. Then we replace there .string with 0000000. As there is only one instance of the empty element, only that one will change.

from bs4 import BeautifulSoup

XML = """
<root>
<TransactionDetails>
    <Name>Jamie Silver</Name>
    <CustomerID>1234567</CustomerID>
    <StaffID/>
</TransactionDetails>
<TransactionDetails>
    <Name>Jamie Silver</Name>
    <CustomerID>1234567</CustomerID>
    <StaffID>9876543</StaffID>
</TransactionDetails>
</root>
"""

soup = BeautifulSoup (XML, 'lxml-xml')
els = soup.select('StaffID:empty')

for el in els:
    el.string = "0000000"

print(soup)

Output:

<?xml version="1.0" encoding="utf-8"?>                                                                                                                      
<root>                                                                                                                                                      
<TransactionDetails>                                                                                                                                        
<Name>Jamie Silver</Name>                                                                                                                                   
<CustomerID>1234567</CustomerID>                                                                                                                            
<StaffID>0000000</StaffID>                                                                                                                                  
</TransactionDetails>                                                                                                                                       
<TransactionDetails>                                                                                                                                        
<Name>Jamie Silver</Name>                                                                                                                                   
<CustomerID>1234567</CustomerID>                                                                                                                            
<StaffID>9876543</StaffID>                                                                                                                                  
</TransactionDetails>                                                                                                                                       
</root>

Hopefully that helps.

Upvotes: 1

Related Questions