SSLError with scrapy in jupyter notebook

I'm using jupyther notebook to extract the items from a web page. For some pages I'm getting I can load the content of the page, thos is the code I'm using:

import requests  
from scrapy.http import TextResponse

start_url = 'https://www.insulation-expo.com/exhibito...16_72.html?offset=0&az=B&aid=34908&return=MzY6TDJWNGFHbGlhWFJ2TGk0dU1UWmZOekl1YUhSdGJEOXZabVp6WlhROU1DWmhlajFD#content'    
r = requests.get(start_url)
response = TextResponse(r.url, body=r.text, encoding='utf-8')

And the Error I'm getting:

SSLError: hostname 'www.insulation-expo.com' doesn't match either of 'www.reedexpo.de', 'reedexpo.de'

I can open the page in shell though

scrapy shell  'https://www.insulation-expo.com/exhibito...16_72.html?offset=0&az=B&aid=34908&return=MzY6TDJWNGFHbGlhWFJ2TGk0dU1UWmZOekl1YUhSdGJEOXZabVp6WlhROU1DWmhlajFD#content'

Upvotes: 0

Views: 182

Answers (1)

Steffen Ullrich
Steffen Ullrich

Reputation: 123320

The problem is that your client is not using Server Name Indication (SNI), i.e. sending the target hostname within the SSL handshake. This is needed to distinguish different hosts on the same IP address within the SSL handshake already so that the server can provide the correct certificate. Without SNI a SSL client gets a certificate for www.reedexpo.de on this IP address. By including the hostname www.insulation-expo.com in the SSL handshake using SNI the client gets instead the certificate which is valid for this hostname.

There are numerous hits when searching for scrapy sni and from this information one might assume that the issue should be fixed with either scrapy version 1.0.0 (2015-06-19) or 1.1.0 (2016-05-11). So please check that your scrapy version is recent enough.

Upvotes: 2

Related Questions