Олег
Олег

Reputation: 191

How to remove previous siblings in BeautifulSoup

I am trying to remove the previous siblings from untop of the <hr /> tag and next siblings below the </h2> tag, The problem is I get this error AttributeError: 'NavigableString' object has no attribute 'decompose'

The HTML that I am trying to parse is something like this

<h1>Heading text</h1>

<p style="text-align: justify;">this and everything untop i want to delete</p>
<hr />
<p style="margin: 0px; font-size: 12px; font-family: Helvetica;"> this and text below i want to keep</p>

<p style="margin: 0px; font-size: 12px; font-family: Helvetica;"> text tex text</p>

<h2>Heading 2</h2>

<p> this and everything below i want to remove</p>

Feeding the html like given above doesnt give the result for removing the siblings and only returns the AttributeError. What am I doing wrong and how can i solve this problem?

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')

for prev_sibling in soup.find("hr").previous_siblings:
    prev_sibling.decompose()

for next_sibling in soup.find("h2").next_siblings:
    prev_sibling.decompose()

Upvotes: 1

Views: 756

Answers (1)

KunduK
KunduK

Reputation: 33384

Use find_previous_siblings() And find_next_siblings()

from bs4 import BeautifulSoup
html='''<h1>Heading text</h1>
<p style="text-align: justify;">this and everything untop i want to delete</p>
<hr />
<p style="margin: 0px; font-size: 12px; font-family: Helvetica;"> this and text below i want to keep</p>
<p style="margin: 0px; font-size: 12px; font-family: Helvetica;"> text tex text</p>
<h2>Heading 2</h2>
<p> this and everything below i want to remove</p>'''

soup = BeautifulSoup(html, 'lxml')

for prev_sibling in soup.find("hr").find_previous_siblings():
    prev_sibling.decompose()

for next_sibling in soup.find("h2").find_next_siblings():
    next_sibling.decompose()

print(soup)

Upvotes: 2

Related Questions