Aces
Aces

Reputation: 37

Why does my process list show multiple threads when running aiohttp?

I'm currently using aiohttp in one of my projects which uses asyncio. After searching for reasons why I'm getting a high amount of memory usage I detected that aiohttp seem to create threads in the background.

I have broken down my code to this minimal code which shows my problem.

import asyncio
import aiohttp
from aiohttp import ClientSession

async def test1(link, session):
    async with session.get(
        link,
    ) as r:
        print(r.status)
        await asyncio.sleep(10)

async def test():
    async with ClientSession(
        cookie_jar=aiohttp.DummyCookieJar(),
) as session:
        await asyncio.gather(test1("https://google.com", session))

loop = asyncio.get_event_loop()
loop.run_until_complete(test())
loop.close()

When running this with ps -e -T |grep python3 I get the following output, which is odd because it looks like it created a thread:

 160304  160304 pts/5    00:00:00 python3
 160304  160306 pts/5    00:00:00 python3

If I change the asyncio.gather to use one more test1 function and run the ps command again I get three threads instead:

 160414  160414 pts/5    00:00:00 python3
 160414  160416 pts/5    00:00:00 python3
 160414  160417 pts/5    00:00:00 python3

This looks very problematic because my assumption was that aiohttp uses an event loop in a single thread, this is why I have used ThreadPoolExecutor to launch a specified amount of threads at the start of the program. If aiohttp creates a new thread for every session.get request then the amount of threads is possibly X specified threads * the current running HTTP requests.

For more context I'm using:

The purpose of my main program is to save the HTML of X amount of domains as quickly as possible. The current architecture is using ThreadPoolExecutor to spin up Y amount of threads and using it throughout the application life, then every thread sends Z amount of HTTP requests simultaneously using session.get and asyncio.gather. Is this the wrong approach and should I use another Python library instead of aiohttp? Is threading in combination with event loops redundant?

I have searched around on the web and I have not found an answer to this question, so I'm humbly asking the community for any smart input.

Upvotes: 3

Views: 765

Answers (1)

Andrew Svetlov
Andrew Svetlov

Reputation: 17386

asyncio always has at least one thread pool under the hood with min(32, (os.cpu_count() or 1) + 4) threads started.

The pool is used by asyncio for DNS lookup internally.

Moreover, even if you setup aiohttp to use aiodns for DNS resolving, the default asyncio pool still exists (while does nothing).

In turn, aiohttp uses the default thread pool for some operations, mostly for local file handling.

For example, await session.post(url, data=open('filename', 'rb')) reads the file chunks for sending in threads; it helps to avoid long blocking calls.

Upvotes: 4

Related Questions