Reputation: 362
Hello experts , I am working on a very challenging task. This is the HTML I have :
<strong>SC/ST:</strong> Minimum 18 Years and Maximum 35 Years<br />
<strong>OBC (Non-Creamy Layer):</strong> Minimum 18 Years and Maximum 33 Years </span></p>
<p><span style="font-family:arial; font-size:small"><span style="font-size:medium"><strong>Facebook Details Data: </strong></span>Data is always gathered valid <a href="https://www.facebook.com/users/09" ><strong>Facebook Web Scraping</strong></a></span></p>
<p><span style="font-family:arial; font-size:small"> </span></p>
<p><span style="font-size:large"><span style="font-family:arial"><span style="font-size:small"><strong><span style="font-size:medium">Districts:</span> </strong>Candidates only from the following districts of Assam can apply for these posts: <br />
This is the output i am trying to Achieve (remove the complete element which has facebook.com, the third line of the html should be removed, since it has
facebook.com
in it )
<strong>SC/ST:</strong> Minimum 18 Years and Maximum 35 Years<br />
<strong>OBC (Non-Creamy Layer):</strong> Minimum 18 Years and Maximum 33 Years </span></p>
<p><span style="font-family:arial; font-size:small"> </span></p>
<p><span style="font-size:large"><span style="font-family:arial"><span style="font-size:small"><strong><span style="font-size:medium">Districts:</span> </strong>Candidates only from the following districts of Assam can apply for these posts: <br />
This is the code I have tried :
getDetails = soup2.find('div', class_='post-body entry-content')
toRemove = "www.facebook.com"
try:
for headless in (getDetails for getDetails in getDetails.find_all('a') if any( getDetails.find(toRemove))):
headless.decompose()
except:
print("facebook not found")
But, this code isnt working, the Output always has facebook.com in it. I have tried everything, but nothing works for me. Its quite a bit of challenge though. Please help me achieve the goal. Thanks
Upvotes: 0
Views: 39
Reputation: 3400
Try to use .parents
which return list of parent tag choose appropriate tag from list and you can pass it to decompose()
method
if "facebook.com" in soup.find("a")['href']:
main_parent_tag=list(soup.find("a").parents)[1]
main_parent_tag.decompose()
Upvotes: 1