PerfectionQuest
PerfectionQuest

Reputation: 5

bs4 parent attrs python

I'm just starting coding in Python and my friend asked me for application finding specific data on the web, representing it nicely. I already found pretty web, where the data is contained, I can find basic info, but then the challenge is to get deeper.

While using BS4 in Python 3.4 I have reached exemplary code:

 <tr class=" " somethingc1="" somethingc2="" somethingc3="" data-something="1" something="1something6" something_id="6something0">
 <td class="text-center td_something">
 <div>
 <a href="something/126" target="_blank">Super String of Something</a>
 </div>
 </td>
 <td class="text-center">08/26 15:00</td>
 <td class="text-center something_status">
 <span class="something_status_something">Full</span>
 </td>
 </tr>
 <tr class=" " somethingc1="" somethingc2="" somethingc3="" data-something="0" something="1something4" something_id="6something7">
 <td class="text-center td_something">
 <div>
 <a href="something/146" target="_blank">Super String of Something</a>
 </div>
 </td>
 <td class="text-center">05/26 15:00</td>
 <td class="text-center something_status">
 <span class="something_status_something"></span>
 </td>
 </tr>

What I want to do now is finding the date string of but only if data-something="1" of parent and not if data-something="0"

I can scrap all dates by :

soup.find_all(lambda tag: tag.name == 'td' and tag.get('class') == ['text-center'] and not tag.has_attr('style'))

but it does not check parent. That is why I tried:

def KieMeWar(tag):
    return tag.name == 'td' and tag.parent.name == 'tr' and tag.parent.attrs == {"data-something": "1"} #and tag.get('class') == ['text-center'] and not tag.has_attr('style')
soup.find_all(KieMeWar)

The result is an empty set. What is wrong or how to reach the target I am aiming for with easiest solution?

P.S. This is exemplary part of full code, that is why I use not Style, even though it does not appear here but does so later.

Upvotes: 0

Views: 2230

Answers (1)

Wondercricket
Wondercricket

Reputation: 7872

BeautifulSoup's findAll has the attrs kwarg, which is used to find tags with a given attribute

import bs4
soup = bs4.BeautifulSoup(html)
trs = soup.findAll('tr', attrs={'data-something':'1'})

That finds all tr tags with the attribute data-something="1". Afterwards, you can loop through the trs and grab the 2nd td tag to extract the date

for t in trs:
    print(str(t.findAll('td')[1].text))
    >>> 08/26 15:00

Upvotes: 1

Related Questions