Reputation: 10373
I'm using jupyther notebook to extract the items from a web page. For some pages I'm getting I can load the content of the page, thos is the code I'm using:
import requests
from scrapy.http import TextResponse
start_url = 'https://www.insulation-expo.com/exhibito...16_72.html?offset=0&az=B&aid=34908&return=MzY6TDJWNGFHbGlhWFJ2TGk0dU1UWmZOekl1YUhSdGJEOXZabVp6WlhROU1DWmhlajFD#content'
r = requests.get(start_url)
response = TextResponse(r.url, body=r.text, encoding='utf-8')
And the Error I'm getting:
SSLError: hostname 'www.insulation-expo.com' doesn't match either of 'www.reedexpo.de', 'reedexpo.de'
I can open the page in shell though
scrapy shell 'https://www.insulation-expo.com/exhibito...16_72.html?offset=0&az=B&aid=34908&return=MzY6TDJWNGFHbGlhWFJ2TGk0dU1UWmZOekl1YUhSdGJEOXZabVp6WlhROU1DWmhlajFD#content'
Upvotes: 0
Views: 182
Reputation: 123320
The problem is that your client is not using Server Name Indication (SNI), i.e. sending the target hostname within the SSL handshake. This is needed to distinguish different hosts on the same IP address within the SSL handshake already so that the server can provide the correct certificate. Without SNI a SSL client gets a certificate for www.reedexpo.de
on this IP address. By including the hostname www.insulation-expo.com
in the SSL handshake using SNI the client gets instead the certificate which is valid for this hostname.
There are numerous hits when searching for scrapy sni and from this information one might assume that the issue should be fixed with either scrapy version 1.0.0 (2015-06-19) or 1.1.0 (2016-05-11). So please check that your scrapy version is recent enough.
Upvotes: 2