python 2.7, xml, beautifulsoup4: only return matching parent tag

Question

I'm trying to parse some XML, but am running into issues with forcing it to only select the request tag if it's a parent tag. For example, part of my XML is:

I've gotten the whole thing parsed how I need it with the exception of the "Total" tag.

The total tag I'm trying to get is:

What's happening, is it's returning the "Total" tag that is a child of RoomRates\RoomRate\Rates\Rate. I'm trying to figure out how to specify it to just return the RoomStays\RoomStay\Total tag. What I currently have is:

soup = bs(response, "xml")

messages = soup.find_all('Message')

for message in messages:
    hotel_code = message.get('HotelCode')

    reservations = message.find_all('HotelReservation')
    for reservation in reservations:
        uniqueid_id = reservation.UniqueID.get('ID')
        uniqueid_idcontext = reservation.UniqueID.get('ID_Context')

        roomstays = reservation.find_all('RoomStay')
        for roomstay in roomstays:

            total = roomstay.Total

Any ideas on how to specify the exact tag I'm trying to pull? If anyone is wondering about the for loops, it's because normally there are multiple "Message", "Hotel Reservation", "Room Stay", etc tags, but i've removed them to only show one. There can also sometimes be multiple Rate\Rates tags, so I can't just ask it to give me the 2nd "Total" tag.

Hopefully I've explained this okay.

abarnert · Accepted Answer

There can also sometimes be multiple Rate\Rates tags, so I can't just ask it to give me the 2nd "Total" tag.

Why not just iterate over all the Total tags and skip the ones that have no Taxes child?

reservations = message.find_all('HotelReservation')
for reservation in reservations:
    totals = reservation.find_all('Total')
    for total in totals:
        if total.find('Taxes'):
             # do stuff
        else:
             # these aren't the totals you're looking for

If you more generally want to eliminate those that have no child nodes, you could do either of these:

if next(total.children, None):
    # it's a parent of something

if total.contents:
    # it's a parent of something

Or you could use a function instead of a string as your filter:

total = reservation.find(lambda node: node.name == 'Total' and node.contents)

Or you could look at other ways to locate this tag: it's a direct child of RoomStay rather than just a descendant; it's not a descendant of Rate; it's the last Taxes descendant under a RoomStay; etc. All of these can be done just as easily.

That being said, this seems like a perfect job for XPath, which BeautifulSoup doesn't support, but ElementTree and lxml do…

python 2.7, xml, beautifulsoup4: only return matching parent tag

Answers (1)

Related Questions