NTLM authentication with Scrapy for web scraping

Question

I am attempting to scrape data from a website that requires authentication.
I have been able to successfully login using requests and HttpNtlmAuth with the following:

s = requests.session()     
url = "https://website.com/things"                                                      
response = s.get(url, auth=HttpNtlmAuth('DOMAIN\USERNAME','PASSWORD'))

I would like to explore the capabilities of Scrapy, however I have not been able to successfully authenticate.

I came across the following middleware which seems like it could work but I do not think I have been implementing it properly:

https://github.com/reimund/ntlm-middleware/blob/master/ntlmauth.py

In my settings.py I have

SPIDER_MIDDLEWARES = { 'test.ntlmauth.NtlmAuthMiddleware': 400, }

and in my spider class I have

http_user = 'DOMAIN\USER'
http_pass = 'PASS'

I have not been able to get this to work.

If anyone has successfully been able to scrape from a website with NTLM authentication can point me in the right direction, I would appreciate it.

Voldemort · Accepted Answer

I was able to figure out what was going on.

1: This is considered a "DOWNLOADER_MIDDLEWARE" not a "SPIDER_MIDDLEWARE".

DOWNLOADER_MIDDLEWARES = { 'test.ntlmauth.NTLM_Middleware': 400, }

2: The middleware which I was trying to use needed to be modified significantly. Here is what works for me:

from scrapy.http import Response
import requests                                                              
from requests_ntlm import HttpNtlmAuth

class NTLM_Middleware(object):

    def process_request(self, request, spider):
        url = request.url
        pwd = getattr(spider, 'http_pass', '')
        usr = getattr(spider, 'http_user', '')
        s = requests.session()     
        response = s.get(url,auth=HttpNtlmAuth(usr,pwd))      
        return Response(url,response.status_code,{}, response.content)

Within the spider, all you need to do is set these variables:

http_user = 'DOMAIN\USER'
http_pass = 'PASS'

NTLM authentication with Scrapy for web scraping

Answers (2)

Related Questions