How to properly get an element with BeautifulSoup?

Question

I'm new to Python and trying to parse a simple HTML. However, one thing stops me: for example, I have this html:


 
some unnecessary text here 


Here's the desired text!

I need to extract text from second div (text). This way I get it:

print repr(link.find('div').findNextSibling())

However, this returns the whole div (with "div" word):

Here's the desired text!

And I don't know how to get text only.

Adding .text results in \u043a\u0430\u043a \u0440\u0430\u0437\u0440\u0430\u0431 strings\
Adding .strings returns "None"
Adding .string returns both "None" and \u042f\u0445\u0438\u043a\u043e - \u0435\u0441\u043b\u0438\

Maybe there's something wrong with repr

P.S. I need to save tags inside div too.

Birei · Accepted Answer

Why don't you simply search the

element based in its class attribute? Something like the following seems to work for me:

from bs4 import BeautifulSoup

html = '''
 
some unnecessary text here 


Here's the desired text!

'''


link = BeautifulSoup(html, 'html')
print link.find('div', class_="text").text.strip()

It yields:

Here's the desired text!

How to properly get an element with BeautifulSoup?

Answers (1)

Related Questions