qalis
qalis

Reputation: 1523

Requests Session asynchronous usage?

My current code creates the separate Session object for every request through the .get() method:

content_getters.py (the relevant part):

def get_page_content(link: str) -> bytes:
    headers = {"User-Agent": "Mozilla/5.0 (Macintosh; "
                             "Intel Mac OS X 10_11_6) "
                             "AppleWebKit/537.36 (KHTML, like Gecko) "
                             "Chrome/61.0.3163.100 Safari/537.36"}

    response = requests.get(link, headers=headers)

    html = response.content.decode("utf-8")

    if response.status_code != requests.codes.ok:
        raise ConnectionError("Page", link, "returned status code",
                              response.status_code)

    return response.content

def parse_single_page(link):
    content = get_page_conent(link)
    # rest of very long function

main.py:

from concurrent.futures.thread import ThreadPoolExecutor

from content_getters import get_page_content, extract_links, parse_single_page

if __name__ == "__main__":
    MAX_THREADS = 30

    # get links
    html: str = get_page_content(
        "https://www.d20pfsrd.com/bestiary/bestiary-hub/monsters-by-cr/") \
        .decode("utf-8")

    links = extract_links(html)

    num_threads = min(MAX_THREADS, len(links))
    with ThreadPoolExecutor(max_workers=num_threads) as executor:
        # asynchronous, threads will return results when they finish their
        # own work
        results = [result for result
                   in executor.map(parse_single_page, links)]

requests docs (link) state that "if you’re making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase". I suppose that my separate calls to the .get() method create separate Session objects for each call, which can be faster.

Question: Is the Session object synchronous (sequential) for all requests made with it? Will I still get asynchronous requests if I use the same Session object for all threads in concurrent.futures.thread.ThreadPoolExecutor, instead of 1 Session per thread as I'm doing now?

Upvotes: 0

Views: 2541

Answers (2)

Bishwajit Ghosh
Bishwajit Ghosh

Reputation: 11

As per the documentation, requests.Session uses urllib3's connection pooling for the sessions. And as per urllib3's documentation, it is a thread-safe system now.

When the question was originally posted it probably wasn't, but in a GitHub comment, it was most likely made thread-safe for good.

Upvotes: 1

rawrex
rawrex

Reputation: 4064

In short, Session is not thread-safe, you can check the issue discussion on Github.

For your case, I would highly recommend to look toward the asyncio and the aiohttp module, where you will have freedom to pass around a session since everything will be in one thread. It also won't induce as much overhead as the multithreading. As they say:

Use asyncio when you can, use threads when you must

The documentation on aiohttp.

Upvotes: 2

Related Questions