Reputation: 362
How can I remove an Element from a Webpage, which has No ID or Class.
This is the website :
https://www.sentinelassam.com/north-east-india-news/assam-news/40-new-colleges-to-be-set-up-in-btc-assam-minister-himanta-biswa-sarma-516420
And the element to remove has a structure like this :
<p><b>Also Read <a href="https://www.sentinelassam.com/national-news/iisfs-vigyan-yatra-flagged-off-from-various-indian-cities-516407">IISF's 'Vigyan Yatra' flagged off from various Indian cities</a></b></p>
The Also Read
text is common across the Multiple Occurences on the Webpage. If I can remove one element, then a loop will Remove all of this (I think).
Can this whole element can be removed using the Also Read
text? I tried to use decompose()
but where do I apply this decompose()
method ?
Upvotes: 0
Views: 278
Reputation: 331
Try this.
from simplified_scrapy import utils, SimplifiedDoc
xml = '''
<p><b>Also Read <a href="https://www.sentinelassam.com/national-news/iisfs-vigyan-yatra-flagged-off-from-various-indian-cities-516407">IISF's 'Vigyan Yatra' flagged off from various Indian cities</a></b></p>
'''
doc = SimplifiedDoc(xml)
# If you want to remove tag b
b = doc.getElementByText('Also Read', tag='b')
b.remove()
print (doc.html)
doc = SimplifiedDoc(xml)
# If you want to remove tag p
p = doc.getElementByText('Also Read', tag='p')
p.remove()
print (doc.html)
Upvotes: 1
Reputation: 2865
Use the Developper tools of Chrome or any other browser. Find the element that you want to remove and then by right clicking on this element choose Copy > Copy selector
. This will give you a selector like this :
#details-page-infinite-scrolling-data > div.article > div.article-text-desc > div > div > p:nth-child(22) > b > a
This selector can probably be simplified. Now you can use BeautifulSoup to remove it:
selector = "#details-page-infinite-scrolling-data > div.article > div.article-text-desc > div > div > p:nth-child(22) > b > a"
soup.select_one(selector).decompose()
I didn't tested it.
Upvotes: 1