Blankman
Blankman

Reputation: 267190

How to spider a password protected site in python?

currently I have a spider written in Java that logs into a supplier website and spiders the website. (using htmlunit)

It keeps the session (cookie) and even lets me enable/disable javascript etc.

I also use htmlparser (java) to help parse the html and extract the relevant information.

Does python have something similar to do this?

Upvotes: 1

Views: 2022

Answers (2)

ebt
ebt

Reputation: 1358

Scrapy API uses urllib2 plus adds wires up some different parsers and helper routines.

Upvotes: 1

Stephen
Stephen

Reputation: 49226

Python has urllib2 to crawl pages, which supports password authentication and cookies.

There is also a HTMLParser for extracting html, but some people prefer the more feature-full BeatifulSoup.

Upvotes: 4

Related Questions