Why does my multi-threaded Python script slow down when I add more threads?

Question

I'm working on a Python application that performs a large number of I/O-bound operations. I decided to use the threading module to speed up the process by handling multiple I/O operations concurrently. However, I've noticed that instead of speeding up, my script actually slows down as I increase the number of threads beyond a certain point.

My Setup:

Python Version: 3.8
OS: Ubuntu 20.04
Task: Downloading and processing thousands of web pages concurrently

The Issue:

When I run the script with up to 10 threads, performance improves as expected. But when I go beyond 10 threads, the script starts to slow down significantly. By the time I reach 50 threads, the performance is worse than running with just a single thread!

Flame Graph Analysis:

I decided to visualize the CPU usage using Flame Graphs. Surprisingly, the Flame Graphs showed a significant amount of time being spent in what appears to be lock contention. This raised questions about whether the GIL is impacting my I/O-bound threads, or if there's another form of lock contention happening.

Here’s a example snippet of the Python code I’m using:

python
import threading
import requests

def download_page(url):
    response = requests.get(url)
    # Simulate processing
    return len(response.content)

urls = ["http://example.com"] * 1000

def worker():
    while urls:
        url = urls.pop()
        print(f"Downloaded {download_page(url)} bytes")

threads = []
for _ in range(50):  # Adjust thread count here
    thread = threading.Thread(target=worker)
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

### What I've Tried: -

GIL Awareness: I understand Python's Global Interpreter Lock (GIL) might be a bottleneck, but since this is an I/O-bound task, I expected threading to be beneficial.

Resource Limits: I checked system resource limits (ulimit) to ensure there are no restrictions on the number of threads.

Thread Pool Executor: I tried using `concurrent.futures.ThreadPoolExecutor` instead of manually managing threads, but the issue persists.

GIL or Something Else?: Given that this is I/O-bound, why does adding more threads cause a slowdown? Could the GIL still be a factor, or is there another form of lock contention happening?

Why does my multi-threaded Python script slow down when I add more threads?

My Setup:

The Issue:

Flame Graph Analysis:

Answers (1)

Related Questions