pafcu
pafcu

Reputation: 8128

Scrape Facebook in Python

I'm interested in getting the number of friends each of my friends on Facebook has. Apparently the official Facebook API does not allow getting the friends of friends, so I need to get around this (somehwhat sensible) limitation somehow. I tried the following:

import sys
import urllib, urllib2, cookielib

username = '[email protected]'
password = 'mypassword'

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'email' : username, 'pass' : password})
request = urllib2.Request('https://login.facebook.com/login.php')
request.add_header('User-Agent','Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12) Gecko/20101027 Fedora/3.6.12-1.fc14 Firefox/3.6.12')
opener.open(request, login_data)
resp = opener.open('http://facebook.com')
print resp.read()

but I only end up with a captcha page. Any idea how FB is detecting that the request is not from a "normal" browser? I could add an extra step and solve the captcha but that would add unnecessary complexity to the program so I would rather avoid it. When I use a web browser with the same User-Agent string I don't get a captcha.

Alternatively, does anyone have any saner ideas on how to accomplish my goal, i.e. get a list of friends of friends?

Upvotes: 7

Views: 2841

Answers (2)

dwarfy
dwarfy

Reputation: 3076

I did try a lot of ways to scrape facebook and the only way that worked for me is :

To install selenium , the firefox plugin, the server and the python client library. Then with the firefox plugin, you can record the actions you do to login and export as a python script, you use this as a base for your work and it will work. Basically I added to this script a request to my webserver to fectch a list of things to inspect on FB and then at the end of the script I send the results back to my server.

I could NOT find a way to do it directly from my server with a browser simulator like mechanize or else ! I guess It needs to be done from a client browser.

Upvotes: 0

Marcelo Cantos
Marcelo Cantos

Reputation: 185862

Have you tried tracing and comparing HTTP transactions with Fiddler2 or Wireshark? Fiddler can even trace https, as long as your client code can be made to work with bogus certs.

Upvotes: 3

Related Questions