How to spider a password protected site in python?

Question

currently I have a spider written in Java that logs into a supplier website and spiders the website. (using htmlunit)

It keeps the session (cookie) and even lets me enable/disable javascript etc.

I also use htmlparser (java) to help parse the html and extract the relevant information.

Does python have something similar to do this?

Stephen · Accepted Answer

Python has urllib2 to crawl pages, which supports password authentication and cookies.

There is also a HTMLParser for extracting html, but some people prefer the more feature-full BeatifulSoup.

Answers (2)