Reputation: 547
I am accessing an https page through a proxy:
def read_page(self,url):
'''
Gets web page using proxy and returns beautifulsoup object
'''
soup = None
try:
r = requests.get(url, proxies=PROXIES, auth=PROXY_AUTH,
cert = ('../static/crawlera-ca.crt'), verify=False,allow_redirects=False)
except requests.exceptions.MissingSchema:
return False
if r.status_code == 200:
soup = bs4.BeautifulSoup(r.text, "html.parser")
if soup:
return soup
return False
I am passing "https://www.bestbuy.com" as the url. I get this error:
requests.exceptions.SSLError: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(336265225, '[SSL] PEM lib (_ssl.c:2964)'),))
When I remove the cert = ('../static/crawlera-ca.crt')
argument, the program accesses the site successfully giving me an 'InsecureRequestWarning', which is expected. But I don't understand why the other error happens. The certificate file is in the right place in my folder hierarchy, and was downloaded from the proxy service, so I know it's right.
The easy option would be to just not use the certificate and suppress the security warning, but I want to do it properly. Can anyone explain what is going on and how I can fix it?
Upvotes: 1
Views: 8596
Reputation: 123270
I think you misunderstood the meaning of the cert
parameter. This is not the (list of) trusted CA you seem to think but this parameter is for the client certificate you use to authenticate yourself against the server. And, such a certificate for authentication also requires a matching private key.
Given that it works without this parameter the server obviously does not need a client certificate from you (which is uncommon anyway). You've probably meant instead to use ../static/crawlera-ca.crt
as the list of trusted CA for certificate validation instead. In this case you should not use the cert
parameter but use the verify
parameter like this:
r = requests.get(url, proxies=PROXIES, auth=PROXY_AUTH,
verify = '../static/crawlera-ca.crt',
allow_redirects=False)
For more information see the documentation of cert parameter and how to use it in authentication with client certificates and how to use verify in server certificate validation.
Upvotes: 2