Elias Zamaria
Elias Zamaria

Reputation: 101083

Verifying HTTPS certificates with urllib.request

I am trying to open an https URL using the urlopen method in Python 3's urllib.request module. It seems to work fine, but the documentation warns that "[i]f neither cafile nor capath is specified, an HTTPS request will not do any verification of the serverโ€™s certificate".

I am guessing I need to specify one of those parameters if I don't want my program to be vulnerable to man-in-the-middle attacks, problems with revoked certificates, and other vulnerabilities.

cafile and capath are supposed to point to a list of certificates. Where am I supposed to get this list from? Is there any simple and cross-platform way to use the same list of certificates that my OS or browser uses?

Upvotes: 11

Views: 25478

Answers (8)

mariusne
mariusne

Reputation: 11

I was looking for a way to make this work out-of-the-box, without installing new modules.
I noticed that pip itself maintains an internal certifi module (see Lib/site-packages/pip/_vendor/certifi). Using this one would remove the need to install certifi yourself (pip is still required, but it's likely that everyone has it)

import ssl
from urllib import request
from pip._vendor import certifi     # use embedded pip._vendor.certifi

ctx = ssl.create_default_context(cafile=certifi.where())
with request.urlopen('https://your-url', context=ctx) as req:
    req.read()

Upvotes: 1

David Foster
David Foster

Reputation: 7995

To open an https URL in Python with validation using system certificates (i.e on Windows or macOS), use:

import ssl
from urllib.request import urlopen

ctx = ssl.create_default_context(ssl.Purpose.SERVER_AUTH)
response = urlopen("http://www.example.com", context=ctx)

If there are no system certificates or they aren't in a reliable location, you can use certificates bundled with the certifi package:

import certifi  # ๐Ÿ‘ˆ 
import ssl
from urllib.request import urlopen

ctx = ssl.create_default_context(ssl.Purpose.SERVER_AUTH)
ctx.load_verify_locations(cafile=certifi.where())  # ๐Ÿ‘ˆ 

response = urlopen("http://www.example.com", context=ctx)

If you additionally want to allow users to specify their own certificates - in the case that certifi-bundled certificates become out of date - you can allow users to specify the $SSL_CERT_FILE environment variable to a certificate bundle (which is a convention originating from the OpenSSL library):

import certifi
import os  # ๐Ÿ‘ˆ 
import ssl
from urllib.request import urlopen

ctx = ssl.create_default_context(ssl.Purpose.SERVER_AUTH)
ctx.load_verify_locations(cafile=certifi.where())
if (cafile := os.environ.get('SSL_CERT_FILE')) is not None:  # ๐Ÿ‘ˆ 
    ctx.load_verify_locations(cafile=cafile)  # ๐Ÿ‘ˆ 

response = urlopen("http://www.example.com", context=ctx)

All of the above should work on Python 3.8+. Or Python 3.4+ if you rewrite use of the walrus operator (:=).

Upvotes: 0

socketpair
socketpair

Reputation: 1999

Different Linux distributives have different pack names. I tested in Centos and Ubuntu. These certificate bundles are updates with system update. So you may just detect which bundle is available and use it with urlopen.

import os
cafile = None
for i in [
    '/etc/ssl/certs/ca-bundle.crt',
    '/etc/ssl/certs/ca-certificates.crt',
]:
    if os.path.exists(i):
        cafile = i
        break
if cafile is None:
    raise RuntimeError('System CA-certificates bundle not found')

Upvotes: 1

miigotu
miigotu

Reputation: 1685

import certifi
import ssl
import urllib.request
try:
    from urllib.request import HTTPSHandler
    context = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
    context.options |= ssl.OP_NO_SSLv2
    context.verify_mode = ssl.CERT_REQUIRED
    context.load_verify_locations(certifi.where(), None)
    https_handler = HTTPSHandler(context=context,  check_hostname=True)
    opener = urllib.request.build_opener(https_handler)
except ImportError:
    opener = urllib.request.build_opener()

opener.addheaders = [('User-agent',  YOUR_USER_AGENT)]
urllib.request.install_opener(opener)

Upvotes: 2

tzatalin
tzatalin

Reputation: 432

Works in python 2.7 and above

context = ssl.create_default_context(cafile=certifi.where())
req = urllib2.urlopen(urllib2.Request(url, body, headers), context=context)

Upvotes: 12

Incinerator
Incinerator

Reputation: 2817

Elias Zamarias answer still works, but gives a deprecation warning:

DeprecationWarning: cafile, cpath and cadefault are deprecated, use a custom context instead.

I was able to solve the same problem this way instead (using Python 3.7.0):

import ssl
import urllib.request

ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
response = urllib.request.urlopen("http://www.example.com", context=ssl_context)

Upvotes: 5

Elias Zamaria
Elias Zamaria

Reputation: 101083

I found a library that does what I'm trying to do: Certifi. It can be installed by running pip install certifi from the command line.

Making requests and verifying them is now easy:

import certifi
import urllib.request

urllib.request.urlopen("https://example.com/", cafile=certifi.where())

As I expected, this returns a HTTPResponse object for a site with a valid certificate and raises a ssl.CertificateError exception for a site with an invalid certificate.

Upvotes: 8

Steffen Ullrich
Steffen Ullrich

Reputation: 123320

You can download the certificates Mozilla in a format usable for urllib (e.g. PEM format) at http://curl.haxx.se/docs/caextract.html

Upvotes: 2

Related Questions