Reputation: 148
I want to get the HTML between two tags with bs4. Is there a way to do javascript's .innerHTML in Beautiful Soup?
This is code that finds a span with the class "title", and gets the text out of it.
def get_title(soup):
title = soup.find('span', {'class': 'title'})
return title.text.encode('utf-8')
This function incorrectly returns the text of the span without the subscripts. 'Title about H2O and CO2'
The following code is the result of title = soup.find('span', {'class': 'title'})
:
<span class="title">Title about H<sub>2</sub>O and CO<sub>2</sub></span>
How do I get the result without the original span?
Desired result: 'Title about H<sub>2</sub>O and CO<sub>2</sub>'
?
Upvotes: 1
Views: 331
Reputation: 148
After finding out that JavaScript has .innerHTML, I was able to google the way to do it in beautiful soup. I found the answer in this question.
After selecting the element with BS4, you can use .decode_contents(formmater='html')
to get the innerHTML.
element.decode_contents(formatter="html")
Upvotes: 1