Reputation: 2198
I'm trying to parse some XML, but am running into issues with forcing it to only select the request tag if it's a parent tag. For example, part of my XML is:
<Messages>
<Message ChainCode="LI" HotelCode="5501" ConfirmationID="5501">
<MessageContent>
<OTA_HotelResNotifRQ TimeStamp="2014-01-24T21:02:43.9318703Z" Version="4" ResStatus="Book">
<HotelReservations>
<HotelReservation>
<RoomStays>
<RoomStay MarketCode="CC" SourceOfBusiness="CRS">
<RoomRates>
<RoomRate EffectiveDate="2014-02-04" ExpireDate="2014-02-06" RoomTypeCode="12112" NumberOfUnits="1" RatePlanCode="RAC">
<Rates>
<Rate EffectiveDate="2014-02-04" ExpireDate="2014-02-06" RateTimeUnit="Day" UnitMultiplier="3">
<Base AmountBeforeTax="749.25" CurrencyCode="USD" />
<Total AmountBeforeTax="749.25" CurrencyCode="USD" />
</Rate>
</Rates>
</RoomRate>
</RoomRates>
<Total AmountBeforeTax="2247.75" CurrencyCode="USD">
<Taxes Amount="0.00" />
</Total>
</RoomStay>
</RoomStays>
</HotelReservation>
</HotelReservations>
</OTA_HotelResNotifRQ>
</MessageContent>
</Message>
</Messages>
I've gotten the whole thing parsed how I need it with the exception of the "Total" tag.
The total tag I'm trying to get is:
<Total AmountBeforeTax="2247.75" CurrencyCode="USD">
<Taxes Amount="0.00" />
</Total>
What's happening, is it's returning the "Total" tag that is a child of RoomRates\RoomRate\Rates\Rate. I'm trying to figure out how to specify it to just return the RoomStays\RoomStay\Total tag. What I currently have is:
soup = bs(response, "xml")
messages = soup.find_all('Message')
for message in messages:
hotel_code = message.get('HotelCode')
reservations = message.find_all('HotelReservation')
for reservation in reservations:
uniqueid_id = reservation.UniqueID.get('ID')
uniqueid_idcontext = reservation.UniqueID.get('ID_Context')
roomstays = reservation.find_all('RoomStay')
for roomstay in roomstays:
total = roomstay.Total
Any ideas on how to specify the exact tag I'm trying to pull? If anyone is wondering about the for loops, it's because normally there are multiple "Message", "Hotel Reservation", "Room Stay", etc tags, but i've removed them to only show one. There can also sometimes be multiple Rate\Rates tags, so I can't just ask it to give me the 2nd "Total" tag.
Hopefully I've explained this okay.
Upvotes: 0
Views: 67
Reputation: 365677
There can also sometimes be multiple Rate\Rates tags, so I can't just ask it to give me the 2nd "Total" tag.
Why not just iterate over all the Total
tags and skip the ones that have no Taxes
child?
reservations = message.find_all('HotelReservation')
for reservation in reservations:
totals = reservation.find_all('Total')
for total in totals:
if total.find('Taxes'):
# do stuff
else:
# these aren't the totals you're looking for
If you more generally want to eliminate those that have no child nodes, you could do either of these:
if next(total.children, None):
# it's a parent of something
if total.contents:
# it's a parent of something
Or you could use a function instead of a string as your filter:
total = reservation.find(lambda node: node.name == 'Total' and node.contents)
Or you could look at other ways to locate this tag: it's a direct child of RoomStay
rather than just a descendant; it's not a descendant of Rate
; it's the last Taxes
descendant under a RoomStay
; etc. All of these can be done just as easily.
That being said, this seems like a perfect job for XPath, which BeautifulSoup
doesn't support, but ElementTree
and lxml
do…
Upvotes: 1