Marvel
Marvel

Reputation: 29

problem to extract a text with Beautiful soup using python

I'm trying to extract texts from the forum website, it works good but if there are 2 lines in one comment it extracts the first line in the comment. see examples below

<div class="wwCommentBody">             
   <blockquote class="postcontent restore " style="padding: 10px;">Happy birthday bro! <br>
    Have a nice day <img src="images/emoji/smile.png" border="0" alt="" title="Smile" 
    class="inlineimg"> 
     </blockquote>            
</div>
r = requests.get("https://example.com/threads/73956/page2", headers=headers, cookies=cookies)
soup = BeautifulSoup(r.content, "html.parser")
comments = soup.find_all('div',{'class':'wwCommentBody'})
for div in comments:
    text = (div.find('blockquote',{'class':'postcontent restore'}))
    first_child = next(text.children, None)
    if first_child is not None:
        print(first_child.string.strip())

Upvotes: 1

Views: 50

Answers (1)

Ram
Ram

Reputation: 4779

Just extract the blockquote and print it's text.

for div in comments:
    bq = div.find('blockquote',{'class':'postcontent restore'})
    print(bq.text)

Upvotes: 2

Related Questions