Sainita
Sainita

Reputation: 362

How to Remove an Element from Webpage with No ID, Class?

How can I remove an Element from a Webpage, which has No ID or Class.

This is the website :

https://www.sentinelassam.com/north-east-india-news/assam-news/40-new-colleges-to-be-set-up-in-btc-assam-minister-himanta-biswa-sarma-516420

And the element to remove has a structure like this :

<p><b>Also Read <a href="https://www.sentinelassam.com/national-news/iisfs-vigyan-yatra-flagged-off-from-various-indian-cities-516407">IISF's 'Vigyan Yatra' flagged off from various Indian cities</a></b></p>

The Also Read text is common across the Multiple Occurences on the Webpage. If I can remove one element, then a loop will Remove all of this (I think).

Can this whole element can be removed using the Also Read text? I tried to use decompose() but where do I apply this decompose() method ?

Upvotes: 0

Views: 278

Answers (2)

yazz
yazz

Reputation: 331

Try this.

from simplified_scrapy import  utils, SimplifiedDoc
xml = '''
<p><b>Also Read <a href="https://www.sentinelassam.com/national-news/iisfs-vigyan-yatra-flagged-off-from-various-indian-cities-516407">IISF's 'Vigyan Yatra' flagged off from various Indian cities</a></b></p>
'''

doc  = SimplifiedDoc(xml)
# If you want to remove tag b
b = doc.getElementByText('Also Read', tag='b')
b.remove()
print (doc.html)

doc  = SimplifiedDoc(xml)
# If you want to remove tag p
p = doc.getElementByText('Also Read', tag='p')
p.remove()
print (doc.html)

Upvotes: 1

snoob dogg
snoob dogg

Reputation: 2865

Use the Developper tools of Chrome or any other browser. Find the element that you want to remove and then by right clicking on this element choose Copy > Copy selector. This will give you a selector like this :

#details-page-infinite-scrolling-data > div.article > div.article-text-desc > div > div > p:nth-child(22) > b > a

This selector can probably be simplified. Now you can use BeautifulSoup to remove it:

selector = "#details-page-infinite-scrolling-data > div.article > div.article-text-desc > div > div > p:nth-child(22) > b > a"
soup.select_one(selector).decompose()

I didn't tested it.

Upvotes: 1

Related Questions