Remove content inside span

Question

It looks quite easy, but I haven't managed to find a solution. I tried other proposed solutions, like: span.clear() but didn't do it.

Web's structure:

           
  Public function  
  
    Name person
    Name person
    
        NONO
    
        Time of Death:13:38:00

Result:

Time of Death: 13:38:00

Desired result:

13:38:00

My code:

whole_section = soup.find('div', {'class':"token"}) # Access to whole section
name_person = whole_section.h2.text  # Select person's name, inside "h2" tag.
time_decease = whole_section.h3.next_sibling.next_sibling.next_sibling.next_sibling.text # Because ther's no tag, I'd to use "next_sibling".

esqew · Accepted Answer

I wouldn't really ever recommend traversing the DOM by repeatedly trying to get the next sibling - in my experience, every time you do this it makes your script more and more prone to breakages for the smallest changes in the source HTML.

Instead, find the parent

you're after by using a lambda function to filter based on the contents of the itself (the 'Time of Death:' string, specifically); then loop through the child elements of that element and remove the to extract what you're after:

html = '''           
  Public function  
  
    Name person
    Name person
    
        NONO
    
        Time of Death:13:38:00
  
'''

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')

whole_section = soup.find('div', {'class':"token"}) # Access to whole section
name_person = whole_section.h2.text  # Select person's name, inside "h2" tag.
time_decease = whole_section.find(lambda element: element.name == 'p' and 'Time of Death:' in element.text)
for span in time_decease.find_all('span'):
  span.decompose()

print(name_person)
print(time_decease.text)

^repl.it

Remove content inside span

Answers (1)

Related Questions