Reputation: 191
I am trying to remove the previous siblings from untop of the <hr />
tag and next siblings below the </h2>
tag, The problem is I get this error AttributeError: 'NavigableString' object has no attribute 'decompose'
The HTML that I am trying to parse is something like this
<h1>Heading text</h1>
<p style="text-align: justify;">this and everything untop i want to delete</p>
<hr />
<p style="margin: 0px; font-size: 12px; font-family: Helvetica;"> this and text below i want to keep</p>
<p style="margin: 0px; font-size: 12px; font-family: Helvetica;"> text tex text</p>
<h2>Heading 2</h2>
<p> this and everything below i want to remove</p>
Feeding the html like given above doesnt give the result for removing the siblings and only returns the AttributeError. What am I doing wrong and how can i solve this problem?
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
for prev_sibling in soup.find("hr").previous_siblings:
prev_sibling.decompose()
for next_sibling in soup.find("h2").next_siblings:
prev_sibling.decompose()
Upvotes: 1
Views: 756
Reputation: 33384
Use find_previous_siblings
() And find_next_siblings
()
from bs4 import BeautifulSoup
html='''<h1>Heading text</h1>
<p style="text-align: justify;">this and everything untop i want to delete</p>
<hr />
<p style="margin: 0px; font-size: 12px; font-family: Helvetica;"> this and text below i want to keep</p>
<p style="margin: 0px; font-size: 12px; font-family: Helvetica;"> text tex text</p>
<h2>Heading 2</h2>
<p> this and everything below i want to remove</p>'''
soup = BeautifulSoup(html, 'lxml')
for prev_sibling in soup.find("hr").find_previous_siblings():
prev_sibling.decompose()
for next_sibling in soup.find("h2").find_next_siblings():
next_sibling.decompose()
print(soup)
Upvotes: 2