Get all text in a tag unless it is in another tag

Question

I'm trying to parse some HTML with BeautifulSoup, and I'd like to get all the text (recursively) in a tag, but I want to ignore all text that appears within a small tag. For example, this HTML:


  
    Final
  
  definition.
  
    Fun fact.

should give the text Final definition. Note that this is a minimal example. In the real HTML, there are many other tags involved, so small should be excluded rather than a being included.

The text attribute of the tag is close to what I want, but it would include Fun fact. I could concatenate the text of all children except the small tags, but that would leave out definition. I couldn't find a method like get_text_until (the small tag is always at the end), so what can I do?

Wander Nauta · Accepted Answer

You can use find_all to find all the tags, clear them, then use get_text():

>>> soup



    Final
  
  definition.
  
    Fun fact.
  


>>> for el in soup.find_all("small"):
...     el.clear()
...
>>> soup



    Final
  
  definition.
  


>>> soup.get_text()
'


    Final
  
  definition.
  

'

Get all text in a tag unless it is in another tag

Answers (2)

Related Questions