Groosha
Groosha

Reputation: 3067

How to properly get an element with BeautifulSoup?

I'm new to Python and trying to parse a simple HTML. However, one thing stops me: for example, I have this html:

<div class = "quote">
<div class = "whatever"> 
some unnecessary text here 
</div>
<div class = "text">
Here's the desired text!
</div>
</div>

I need to extract text from second div (text). This way I get it:

print repr(link.find('div').findNextSibling())

However, this returns the whole div (with "div" word): <div class="text">Here's the desired text!</div>

And I don't know how to get text only.

Maybe there's something wrong with repr

P.S. I need to save tags inside div too.

Upvotes: 0

Views: 37

Answers (1)

Birei
Birei

Reputation: 36282

Why don't you simply search the <div> element based in its class attribute? Something like the following seems to work for me:

from bs4 import BeautifulSoup

html = '''<div class = "quote">
<div class = "whatever"> 
some unnecessary text here 
</div>
<div class = "text">
Here's the desired text!
</div>
</div>'''


link = BeautifulSoup(html, 'html')
print link.find('div', class_="text").text.strip()

It yields:

Here's the desired text!

Upvotes: 1

Related Questions