smurfit89
smurfit89

Reputation: 337

How to extract a comment with beautifulsoup?

I am new to python and data mining at all, so I have a question about extracting a part from an output. I am using Python in 3.6 and have updated all stuff today in the morning. I have anonymized the ouput and removed all lines containing passwords, tokens and so on.

from bs4 import BeautifulSoup

soup = BeautifulSoup(open("facebookoutput.html"), "html.parser")

comments = soup.findAll('div', class_="_2b06")

print(comments[0]) # show print of first entry:

<div class="_2b06"><div class="_2b05"><a href="/stuartd?fref=nf&amp;rc=p&    amp;__tn__=R-R">some Name </a></div><div data-commentid="100000000000000000222222000000000000000" data-sigil="comment-body">There is nice comment. I like stackoverflow. </div></div>

I am stucking to get `There is nice comment. I like stackoverflow.´ out of it.

Thanks in advance.

Upvotes: 0

Views: 662

Answers (1)

SIM
SIM

Reputation: 22440

Try this:

from bs4 import BeautifulSoup

content="""
<div class="_2b06"><div class="_2b05"><a href="/stuartd?fref=nf&amp;rc=p&    amp;__tn__=R-R">some Name </a></div><div data-commentid="100000000000000000222222000000000000000" data-sigil="comment-body">There is nice comment. I like stackoverflow. </div></div>
"""

soup = BeautifulSoup(content, "html.parser")
comments = ' '.join([item.text for item in soup.select("[data-sigil='comment-body']")])
print(comments)

Output:

There is nice comment. I like stackoverflow.

Upvotes: 1

Related Questions