Requests / Asyncio: Is there a drawback for making pool_maxsize=1000 with a Python requests session?

Question

I am using the following code to increase the pool maxsize with requests:

import requests
session = requests.Session()
session.mount("https://", requests.adapters.HTTPAdapter(pool_maxsize=50))
session.mount("http://", requests.adapters.HTTPAdapter(pool_maxsize=50))

Is there a drawback to settings pool_maxsize=1000?

I sometimes need 50 - 1000 connections, but most of the time I only need 1 connection.

Alternatively, is there a way to allow dynamic pool sizing?

Which solution is best:

Set pool_maxsize = 1000
Create 2 sessions, 1 with pool_maxsize = 1 and the other with pool_maxsize=1000.
Dynamically alter pool_maxsize as and when I need a different number of connections. (if possible)

Speed is paramount!

Edit: Most of the time I'm doing normal requests:

session.get(....)

But sometimes I am using asyncio where I will have a large number of requests to carry out:

import asyncio
async def perform_async_calls(self, session, urls):
    loop = asyncio.get_event_loop()
    futures = []
    for url in urls:
        futures.append(loop.run_in_executor(None, session.get, url)

    results = []
    for future in futures:
        result = await future
        results.append(result.json())

    return results

daz · Accepted Answer

In HTTP 1.1, clients are able to send multiple requests through the same connection by setting the Keep-Alive header. Otherwise you would have to open a new connection for every single request. With Keep-Alive set, multiple requests can be sent over the same connection.

Opening a connection is time costly, as it requires an additional round trip for the TCP handshake before you can make the next request, so it is faster to reuse an already existing connection instead.

What connection pooling does, is that after you make a request, that connection is kept open for subsequent requests you may make, being set aside in the pool. Requests sets the keep alive header and manages the pool behind the scenes so you typically don't need to worry about it.

If you are using multithreading, the pool synchronizes access to the connections so that each connection is only used by one thread at a time. Having multiple threads make requests simultaneously requires multiple connections in the pool.

Having more connections than threads won't increase performance much at all as the requests package blocks the thread when making a request, so it isn't possible for a thread to use multiple connections at a time. To get more performance out of additional connections requires increasing the number of threads.

Adding connections like this though only helps when you are not bottlenecked by the network speed, opening more connections doesn't make the internet faster. So just add more threads+connections until performance stops increasing.

I'm not sure what you mean by dynamic resizing. The pool opens new connections when needed and reuses old ones when it can. It stops opening connections once the max is reached. In a sense, the pool size is dynamic as long as it is below the max size.

Requests / Asyncio: Is there a drawback for making pool_maxsize=1000 with a Python requests session?

Answers (1)

Related Questions