Sainita
Sainita

Reputation: 362

Unable to locate html element in Beautiful Soup

Hello experts , I am working on a very challenging task. This is the HTML I have :

<strong>SC/ST:</strong> Minimum 18 Years and Maximum 35 Years<br />
<strong>OBC (Non-Creamy Layer):</strong> Minimum 18 Years and Maximum 33 Years&nbsp;</span></p>

<p><span style="font-family:arial; font-size:small"><span style="font-size:medium"><strong>Facebook Details Data: </strong></span>Data is always gathered valid <a href="https://www.facebook.com/users/09" ><strong>Facebook Web Scraping</strong></a></span></p>

<p><span style="font-family:arial; font-size:small">&nbsp;</span></p>

<p><span style="font-size:large"><span style="font-family:arial"><span style="font-size:small"><strong><span style="font-size:medium">Districts:</span> </strong>Candidates only from the following districts of Assam can apply for these posts: &nbsp;<br />

This is the output i am trying to Achieve (remove the complete element which has facebook.com, the third line of the html should be removed, since it has facebook.com in it )

<strong>SC/ST:</strong> Minimum 18 Years and Maximum 35 Years<br />
<strong>OBC (Non-Creamy Layer):</strong> Minimum 18 Years and Maximum 33 Years&nbsp;</span></p>
    
<p><span style="font-family:arial; font-size:small">&nbsp;</span></p>

<p><span style="font-size:large"><span style="font-family:arial"><span style="font-size:small"><strong><span style="font-size:medium">Districts:</span> </strong>Candidates only from the following districts of Assam can apply for these posts: &nbsp;<br />

This is the code I have tried :

getDetails = soup2.find('div', class_='post-body entry-content')
toRemove = "www.facebook.com"
try:
    for headless in (getDetails for getDetails in getDetails.find_all('a') if any( getDetails.find(toRemove))):
        headless.decompose()
except:
    print("facebook not found")

But, this code isnt working, the Output always has facebook.com in it. I have tried everything, but nothing works for me. Its quite a bit of challenge though. Please help me achieve the goal. Thanks

Upvotes: 0

Views: 39

Answers (1)

Bhavya Parikh
Bhavya Parikh

Reputation: 3400

Try to use .parents which return list of parent tag choose appropriate tag from list and you can pass it to decompose() method

if "facebook.com"  in soup.find("a")['href']:
    main_parent_tag=list(soup.find("a").parents)[1]
    main_parent_tag.decompose()

Upvotes: 1

Related Questions