Reputation: 5
I'm just starting coding in Python and my friend asked me for application finding specific data on the web, representing it nicely. I already found pretty web, where the data is contained, I can find basic info, but then the challenge is to get deeper.
While using BS4 in Python 3.4 I have reached exemplary code:
<tr class=" " somethingc1="" somethingc2="" somethingc3="" data-something="1" something="1something6" something_id="6something0">
<td class="text-center td_something">
<div>
<a href="something/126" target="_blank">Super String of Something</a>
</div>
</td>
<td class="text-center">08/26 15:00</td>
<td class="text-center something_status">
<span class="something_status_something">Full</span>
</td>
</tr>
<tr class=" " somethingc1="" somethingc2="" somethingc3="" data-something="0" something="1something4" something_id="6something7">
<td class="text-center td_something">
<div>
<a href="something/146" target="_blank">Super String of Something</a>
</div>
</td>
<td class="text-center">05/26 15:00</td>
<td class="text-center something_status">
<span class="something_status_something"></span>
</td>
</tr>
What I want to do now is finding the date string of but only if data-something="1" of parent and not if data-something="0"
I can scrap all dates by :
soup.find_all(lambda tag: tag.name == 'td' and tag.get('class') == ['text-center'] and not tag.has_attr('style'))
but it does not check parent. That is why I tried:
def KieMeWar(tag):
return tag.name == 'td' and tag.parent.name == 'tr' and tag.parent.attrs == {"data-something": "1"} #and tag.get('class') == ['text-center'] and not tag.has_attr('style')
soup.find_all(KieMeWar)
The result is an empty set. What is wrong or how to reach the target I am aiming for with easiest solution?
P.S. This is exemplary part of full code, that is why I use not Style, even though it does not appear here but does so later.
Upvotes: 0
Views: 2230
Reputation: 7872
BeautifulSoup's findAll
has the attrs
kwarg, which is used to find tags with a given attribute
import bs4
soup = bs4.BeautifulSoup(html)
trs = soup.findAll('tr', attrs={'data-something':'1'})
That finds all tr
tags with the attribute data-something="1"
. Afterwards, you can loop through the trs
and grab the 2nd td
tag to extract the date
for t in trs:
print(str(t.findAll('td')[1].text))
>>> 08/26 15:00
Upvotes: 1