Reputation: 119
I have an html code as follows:
<div class="_cFb">
<div class="_XWk">Rabindranath Tagore</div>
</div>
I have used the following python code for extracting the text content:
soup.find_all('div', attrs={'class':'._XWk'})
This code returns empty. However I am able to access other class atrributes that doesnt begin with an underscore(_). Any ideas to extract the tag text?
Upvotes: 1
Views: 8190
Reputation: 12158
In [87]: soup.find_all('div', attrs={'class':'_XWk'})
remove the .
in the ._XWk
Upvotes: 0
Reputation: 968
This works:
>>> import bs4
>>> soup = bs4.BeautifulSoup('''<div class="_cFb">
... <div class="_XWk">Rabindranath Tagore</div>
... </div>''', 'html.parser')
>>> soup.find_all('div', class_='_XWk')
[<div class="_XWk">Rabindranath Tagore</div>]
Found the way to search by class here: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-by-css-class
By the way, lxml framework, which also can be used for parsing HTML, allows using CSS selectors for searching.
Upvotes: 5