user3136030
user3136030

Reputation: 384

How to skip a particular tag and crawl other tag's text in Beautifulsoup

I am crawling a webpage and i am using Beautifulsoup. There is a condition where i want to skip the content of one particular tag and get other tag contents. In the below code i don't want div tag contents. But i couldn't solve this. Please help me.

HTML code,

<blockquote class="messagetext">
    <div style="margin: 5px; float: right;">
        unwanted text .....
    </div>
    Text..............
    <a class="externalLink" rel="nofollow" target="_blank" href="#">text </a>
    <a class="externalLink" rel="nofollow" target="_blank" href="#">text</a>
    <a class="externalLink" rel="nofollow" target="_blank" href="#">text</a>
    ,text
</blockquote>

I have tried like this,

content = soup.find('blockquote',attrs={'class':'messagetext'}).text    

But it is fetching unwanted text inside div tag also.

Upvotes: 0

Views: 581

Answers (1)

PepperoniPizza
PepperoniPizza

Reputation: 9112

Use the clear function like this:

soup = BeautifulSoup(html_doc)
content = soup.find('blockquote',attrs={'class':'messagetext'})

for tag in content.findChildren():
    if tag.name == 'div':
        tag.clear()

print content.text

This yields:

Text..............
text 
text
text
   ,text

Upvotes: 2

Related Questions