bs4 parent attrs python

Question

I'm just starting coding in Python and my friend asked me for application finding specific data on the web, representing it nicely. I already found pretty web, where the data is contained, I can find basic info, but then the challenge is to get deeper.

While using BS4 in Python 3.4 I have reached exemplary code:

 
 
 
 Super String of Something
 
 
 08/26 15:00
 
 Full
 
 
 
 
 
 Super String of Something
 
 
 05/26 15:00

What I want to do now is finding the date string of but only if data-something="1" of parent and not if data-something="0"

I can scrap all dates by :

soup.find_all(lambda tag: tag.name == 'td' and tag.get('class') == ['text-center'] and not tag.has_attr('style'))

but it does not check parent. That is why I tried:

def KieMeWar(tag):
    return tag.name == 'td' and tag.parent.name == 'tr' and tag.parent.attrs == {"data-something": "1"} #and tag.get('class') == ['text-center'] and not tag.has_attr('style')
soup.find_all(KieMeWar)

The result is an empty set. What is wrong or how to reach the target I am aiming for with easiest solution?

P.S. This is exemplary part of full code, that is why I use not Style, even though it does not appear here but does so later.

Wondercricket · Accepted Answer

BeautifulSoup's findAll has the attrs kwarg, which is used to find tags with a given attribute

import bs4
soup = bs4.BeautifulSoup(html)
trs = soup.findAll('tr', attrs={'data-something':'1'})

That finds all tr tags with the attribute data-something="1". Afterwards, you can loop through the trs and grab the 2nd td tag to extract the date

for t in trs:
    print(str(t.findAll('td')[1].text))
    >>> 08/26 15:00

bs4 parent attrs python

Answers (1)

Related Questions