Why is Python 3 http.client so much faster than python-requests?

Question

I was testing different Python HTTP libraries today and I realized that http.client library seems to perform much much faster than requests.

To test it you can run following two code samples.

import http.client

conn = http.client.HTTPConnection("localhost", port=8000)
for i in range(1000):
    conn.request("GET", "/")
    r1 = conn.getresponse()
    body = r1.read()
    print(r1.status)

conn.close()

and here is code doing same thing with python-requests:

import requests

with requests.Session() as session:
    for i in range(1000):
        r = session.get("http://localhost:8000")
        print(r.status_code)

If I start SimpleHTTPServer:

> python -m http.server

and run above code samples (I'm using Python 3.5.2). I get following results:

http.client:

0.35user 0.10system 0:00.71elapsed 64%CPU

python-requests:

1.76user 0.10system 0:02.17elapsed 85%CPU

Are my measurements and tests correct? Can you reproduce them too? If yes does anyone know what's going on inside http.client that make it so much faster? Why is there such big difference in processing time?

Jason S · Accepted Answer

Based on profiling both, the main difference appears to be that the requests version is doing a DNS lookup for every request, while the http.client version is doing so once.

# http.client
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1974    0.541    0.000    0.541    0.000 {method 'recv_into' of '_socket.socket' objects}
     1000    0.020    0.000    0.045    0.000 feedparser.py:470(_parse_headers)
    13000    0.015    0.000    0.563    0.000 {method 'readline' of '_io.BufferedReader' objects}
...

# requests
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1481    0.827    0.001    0.827    0.001 {method 'recv_into' of '_socket.socket' objects}
     1000    0.377    0.000    0.382    0.000 {built-in method _socket.gethostbyname}
     1000    0.123    0.000    0.123    0.000 {built-in method _scproxy._get_proxy_settings}
     1000    0.111    0.000    0.111    0.000 {built-in method _scproxy._get_proxies}
    92000    0.068    0.000    0.284    0.000 _collections_abc.py:675(__iter__)
...

You're providing the hostname to http.client.HTTPConnection() once, so it makes sense it would call gethostbyname once. requests.Session probably could cache hostname lookups, but it apparently does not.

EDIT: After some further research, it's not just a simple matter of caching. There's a function for determining whether to bypass proxies which ends up invoking gethostbyname regardless of the actual request itself.

Why is Python 3 http.client so much faster than python-requests?

Answers (2)

Related Questions