moglido
moglido

Reputation: 148

Extract the HTML from between two HTML tags in BeautifulSoup 4.6

I want to get the HTML between two tags with bs4. Is there a way to do javascript's .innerHTML in Beautiful Soup?

This is code that finds a span with the class "title", and gets the text out of it.

def get_title(soup):
title = soup.find('span', {'class': 'title'})
return title.text.encode('utf-8')

This function incorrectly returns the text of the span without the subscripts. 'Title about H2O and CO2'

The following code is the result of title = soup.find('span', {'class': 'title'}):

<span class="title">Title about H<sub>2</sub>O and CO<sub>2</sub></span>

How do I get the result without the original span?

Desired result: 'Title about H<sub>2</sub>O and CO<sub>2</sub>'?

Upvotes: 1

Views: 331

Answers (1)

moglido
moglido

Reputation: 148

After finding out that JavaScript has .innerHTML, I was able to google the way to do it in beautiful soup. I found the answer in this question.

After selecting the element with BS4, you can use .decode_contents(formmater='html') to get the innerHTML.

element.decode_contents(formatter="html")

Upvotes: 1

Related Questions