Near
Near

Reputation: 401

Scraping a site with alert window authentification

I'm trying to make a python app that would scrape some data off my university's LMS to check if there are new files and if yes then download them to my local directories.

The access to the page however is secured with my login and password, but there's no such thing as a html login form for that site, it's just an Alert window type popup where I put my login/password (Like the login you usually do when logging into a router), and I'm not sure how to proceed then.

Could someone help me out or point me to some resource on how to authenticate the connection at this type of a site before trying to scrape? Either with mechanize or something else.

Thanks.

Upvotes: 3

Views: 2457

Answers (3)

Will Blanton
Will Blanton

Reputation: 51

The answer posted by Near is probably the best option. I've been looking everywhere for a while now and have never been able to get the normal "http://user:pass@url" to work for me. Using the requests_ntlm library is the ONLY thing I've been able to get to work in my particular project so I HIGHLY recommend checking it out if you are having issues with HTTP authentication.

Upvotes: 0

Near
Near

Reputation: 401

For anyone interested:

I found a way to do this using beautifulsoup, requests and requests_ntlm libraries.

Upvotes: 3

esfy
esfy

Reputation: 683

I think that is a HTTP Basic authorization. Try if you can login with

http(s)://(username):(password)@(url) in your browser's address bar.

If that's the case, in settings.py, use this to enable the corresponding middleware:

DOWNLOADER_MIDDLEWARE = [ 'scrapy.contrib.downloadermiddleware.httpauth.HttpAuthMiddleware']

And use the middleware like this this in your spider:

class TheSpider(scrapy.Spider):

http_user = 'username'
http_pass = 'password'

def crawl(self, response):
    pass

# do teh magicz!

It's in the documentation---See here for details.

Upvotes: 5

Related Questions