Anu
Anu

Reputation: 119

Beautifulsoup FindAll by class attribute

I have an html code as follows:

<div class="_cFb">  
<div class="_XWk">Rabindranath Tagore</div>
</div>

I have used the following python code for extracting the text content:

soup.find_all('div', attrs={'class':'._XWk'})

This code returns empty. However I am able to access other class atrributes that doesnt begin with an underscore(_). Any ideas to extract the tag text?

Upvotes: 1

Views: 8190

Answers (2)

宏杰李
宏杰李

Reputation: 12158

In [87]: soup.find_all('div', attrs={'class':'_XWk'})

remove the . in the ._XWk

Upvotes: 0

Andrew Che
Andrew Che

Reputation: 968

This works:

>>> import bs4
>>> soup = bs4.BeautifulSoup('''<div class="_cFb">  
... <div class="_XWk">Rabindranath Tagore</div>
... </div>''', 'html.parser')
>>> soup.find_all('div', class_='_XWk')
[<div class="_XWk">Rabindranath Tagore</div>]

Found the way to search by class here: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-by-css-class

By the way, lxml framework, which also can be used for parsing HTML, allows using CSS selectors for searching.

Upvotes: 5

Related Questions