Brad Solomon
Brad Solomon

Reputation: 40878

When should a single session instance be used for requests?

From the aiohttp docs:

[An aiohttp.ClientSession] encapsulates a connection pool (connector instance) and supports keepalives by default. Unless you are connecting to a large, unknown number of different servers over the lifetime of your application, it is suggested you use a single session for the lifetime of your application to benefit from connection pooling.

I have almost always used the practice of keeping a single ClientSession instance (with cookies enabled & a custom connector/adapter*) for any size or container of URLs, no matter how heterogeneous those URLs are or how many of them there are. I would like to know if there are downsides to that approach.

I'm hoping to have a more granular, contextual definition of what "large, unknown number of different servers" constitutes in practice. What are the best practices for cases like the one presented below? Should a ClientSession be dedicated to each netloc, rather than a single instance for the whole set?** Is the decision over whether to use a single client session dictated solely by response time?

It is often the case that I have "batches" of endpoints; the netloc for each batch is homogenous, but the netlocs between batches are different. For example,

urls = {
    'https://aiohttp.readthedocs.io/en/stable/index.html',
    'https://aiohttp.readthedocs.io/en/stable/client_reference.html',
    'https://aiohttp.readthedocs.io/en/stable/web_advanced.html#aiohttp-web-middlewares',

    'https://www.thesaurus.com/',
    'https://www.thesaurus.com/browse/encapsulate',
    'https://www.thesaurus.com/browse/connection?s=t',

    'https://httpbin.org/',
    'https://httpbin.org/#/HTTP_Methods',
    'https://httpbin.org/status/200'
}

To put a number on it, in reality each batch is probably of length 25-50.


*What I have done now is to limit open connections to any single host by passing a connector instance to ClientSession, which is aiohttp.TCPConnector(limit_per_host=10).

**Specifically, {'www.thesaurus.com', 'aiohttp.readthedocs.io', 'httpbin.org'} i.e. set(urllib.parse.urlsplit(u).netloc for u in urls).

Upvotes: 5

Views: 1510

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121744

You'd want to use a dedicated session with it's own connector when

  1. You want to customise the connector parameters for a set of connections (say, alter the limit per host, or alter the SSL configuration, or set different timeouts).
  2. You'd run through the default 100 connections limit at which point cached connections to existing hosts are just as likely to have been recycled as to be open still.

The latter scenario is what the documentation hints at. Say you have a larger number of unique hosts to connect to (where a unique host is a unique combination of hostname, port number and whether or not SSL is used), but some of those hosts are being contacted more often than others. If that 'large number' is > 100, then chances are that you have to keep opening new connections for the 'frequent' hosts you already connected to before, because the pool had to close them to create a connection for a host not currently in the pool. That'll hurt performance.

But if you created a separate pool for the 'frequent' hosts, then you can keep those host connections open much longer. They don't have to compete for free connections from the 'general use' pool with all those infrequent host connections.

In aiohttp you create separate pools by using separate sessions, you'll have to then define logic to pick what session to use for a given request.

For comparison, the requests library (a synchronous HTTP API) handles this a little differently, where you can register separate transport adapters per url prefix.

Upvotes: 5

Related Questions