goodname
goodname

Reputation: 49

Remove content inside span

It looks quite easy, but I haven't managed to find a solution. I tried other proposed solutions, like: span.clear() but didn't do it.

Web's structure:

<div class="details">           
  <h2>Public function</h2>  
  <div class="token">
    <h2>Name person</h2>
    <h3>Name person</h3>
    <p>
        <span>NO</span>NO</p>
    <p>
        <span>Time of Death:</span>13:38:00</p>

Result:

Time of Death: 13:38:00

Desired result:

13:38:00

My code:

whole_section = soup.find('div', {'class':"token"}) # Access to whole section
name_person = whole_section.h2.text  # Select person's name, inside "h2" tag.
time_decease = whole_section.h3.next_sibling.next_sibling.next_sibling.next_sibling.text # Because ther's no tag, I'd to use "next_sibling".   
    

Upvotes: 0

Views: 294

Answers (1)

esqew
esqew

Reputation: 44701

I wouldn't really ever recommend traversing the DOM by repeatedly trying to get the next sibling - in my experience, every time you do this it makes your script more and more prone to breakages for the smallest changes in the source HTML.

Instead, find the parent <p></p> you're after by using a lambda function to filter based on the contents of the <p></p> itself (the 'Time of Death:' string, specifically); then loop through the child elements of that <p></p> element and remove the <span></span> to extract what you're after:

html = '''<div class="details">           
  <h2>Public function</h2>  
  <div class="token">
    <h2>Name person</h2>
    <h3>Name person</h3>
    <p>
        <span>NO</span>NO</p>
    <p>
        <span>Time of Death:</span>13:38:00</p>
  </div>
</div>'''

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')

whole_section = soup.find('div', {'class':"token"}) # Access to whole section
name_person = whole_section.h2.text  # Select person's name, inside "h2" tag.
time_decease = whole_section.find(lambda element: element.name == 'p' and 'Time of Death:' in element.text)
for span in time_decease.find_all('span'):
  span.decompose()

print(name_person)
print(time_decease.text)

repl.it

Upvotes: 1

Related Questions