Jay Setti
Jay Setti

Reputation: 181

Python Scraping fb comments from a website

I have been trying to scrape facebook comments using Beautiful Soup on the below website pages.

import BeautifulSoup
import urllib2
import re

url = 'http://techcrunch.com/2012/05/15/facebook-lightbox/'

fd = urllib2.urlopen(url)

soup = BeautifulSoup.BeautifulSoup(fd)

fb_comment = soup("div", {"class":"postText"}).find(text=True)

print fb_comment

The output is a null set. However, I can clearly see the facebook comment is within those above tags in the inspect element of the techcrunch site (I am little new to Python and was wondering if the approach is correct and where I am going wrong?)

Upvotes: 3

Views: 4094

Answers (3)

Lynx-Lab
Lynx-Lab

Reputation: 795

Like Christopher and Thiefmaster: it is all because of javascript.

But, if you really need that information, you can still retrieve it thanks to Selenium on http://seleniumhq.org then use beautifulsoup on this output.

Upvotes: 1

Christopher Hackett
Christopher Hackett

Reputation: 6192

The parts of the page you are looking for are not included in the source file. Use a browser and you can see this for yourself by opening the page source.

You will need to use something like pywebkitgtk to have the javascript executed before passing the document to BeautifulSoup

Upvotes: 0

ThiefMaster
ThiefMaster

Reputation: 318518

Facebook comments are loaded dynamically using AJAX. You can scrape the original page to retrieve this:

<fb:comments href="http://techcrunch.com/2012/05/15/facebook-lightbox/" num_posts="25" width="630"></fb:comments>

After that you need to send a request to some Facebook API that will give you the comments for the URL in that tag.

Upvotes: 0

Related Questions