Reputation: 673
There is a web-site to crawl, with POST-authentication.
How can I, having login and password for it, crawl closed sections of this site?
Upvotes: 3
Views: 7814
Reputation: 6246
You ought to watch the recent PyCon talk by Asheesh Laroia titled "Web scraping: Reliably and efficiently pull data"
The lecture is 2h39m but covers lots, and is at a friendly pace. In fact, it's one of the best programming videos i've ever seen.
Upvotes: 2
Reputation: 11173
Granted you could use urllib2 to do POST authentication and do the crawling. But if you haven't already learned urllib2, you are probably much better off using the nice requests
library.
You can find instructions and really nice tutorial at http://docs.python-requests.org/en/latest/index.html.
To install the package do pip install requests
. On Mac or other Unix systems, you'll need to prefix the command with sudo
, like this sudo pip install requests
Upvotes: 3
Reputation: 36423
This similar question here might help: How to use Python to login to a webpage and retrieve cookies for later usage? and this too: Python Site Login and finally this: Login to website using python shows how to login and use the logged in cookie for the rest of the session thus letting you parse/scrape 'closed' sections. Have a look at the urllib too for more help
Upvotes: 1