Removing text in a ^{tag from a span while scraping the rest of the text}

Question

I'm trying to scrape text with beautiful soup and I need to get text from inside a span with a specific class but discard the superscript numbers inside the same span with a different class. I can very easily use get_text to pull the number and the contents from the span but I end up with the superscript numbers as well. The solution needs to be able to discard each instance of the sup tag as well as its text contents.

Example HTML:


 ¹⁶
  The text I want

What I get right now: 16 The text I want

What I want: The text I want

Michael Dz · Accepted Answer

You can extract all sup tags using .sup.extract()

html = '¹⁶The text I want'

parsed_element = bs.BeautifulSoup(html, 'html.parser')
[s.extract() for s in parsed_element('sup')]
text = parsed_element.text

Removing text in a <sup> tag from a span while scraping the rest of the text

Answers (2)

Related Questions

Removing text in a &lt;sup&gt; tag from a span while scraping the rest of the text

Answers (2)

Related Questions

Removing text in a <sup> tag from a span while scraping the rest of the text