utter_step
utter_step

Reputation: 673

How to act like logged-in user while crawling the site?

There is a web-site to crawl, with POST-authentication.

How can I, having login and password for it, crawl closed sections of this site?

Upvotes: 3

Views: 7814

Answers (3)

jon
jon

Reputation: 6246

You ought to watch the recent PyCon talk by Asheesh Laroia titled "Web scraping: Reliably and efficiently pull data"

The lecture is 2h39m but covers lots, and is at a friendly pace. In fact, it's one of the best programming videos i've ever seen.

Upvotes: 2

Maria Zverina
Maria Zverina

Reputation: 11173

Granted you could use urllib2 to do POST authentication and do the crawling. But if you haven't already learned urllib2, you are probably much better off using the nice requests library.

You can find instructions and really nice tutorial at http://docs.python-requests.org/en/latest/index.html.

To install the package do pip install requests. On Mac or other Unix systems, you'll need to prefix the command with sudo, like this sudo pip install requests

Upvotes: 3

David Kroukamp
David Kroukamp

Reputation: 36423

This similar question here might help: How to use Python to login to a webpage and retrieve cookies for later usage? and this too: Python Site Login and finally this: Login to website using python shows how to login and use the logged in cookie for the rest of the session thus letting you parse/scrape 'closed' sections. Have a look at the urllib too for more help

Upvotes: 1

Related Questions