Reputation: 29
I'm trying to extract texts from the forum website, it works good but if there are 2 lines in one comment it extracts the first line in the comment. see examples below
<div class="wwCommentBody">
<blockquote class="postcontent restore " style="padding: 10px;">Happy birthday bro! <br>
Have a nice day <img src="images/emoji/smile.png" border="0" alt="" title="Smile"
class="inlineimg">
</blockquote>
</div>
r = requests.get("https://example.com/threads/73956/page2", headers=headers, cookies=cookies)
soup = BeautifulSoup(r.content, "html.parser")
comments = soup.find_all('div',{'class':'wwCommentBody'})
for div in comments:
text = (div.find('blockquote',{'class':'postcontent restore'}))
first_child = next(text.children, None)
if first_child is not None:
print(first_child.string.strip())
Upvotes: 1
Views: 50
Reputation: 4779
Just extract the blockquote
and print it's text.
for div in comments:
bq = div.find('blockquote',{'class':'postcontent restore'})
print(bq.text)
Upvotes: 2