ab217
ab217

Reputation: 17160

python/httpx/asyncio: httpx.RemoteProtocolError: Server disconnected without sending a response

I am attempting to optimize a simple web scraper that I made. It gets a list of urls from a table on a main page and then goes to each of those "sub" urls and gets information from those pages. I was able to successfully write it synchronously and using concurrent.futures.ThreadPoolExecutor(). However, I am trying to optimize it to use asyncio and httpx as these seem to be very fast for making hundreds of http requests.

I wrote the following script using asyncio and httpx however, I keep getting the following errors:

httpcore.RemoteProtocolError: Server disconnected without sending a response.

RuntimeError: The connection pool was closed while 4 HTTP requests/responses were still in-flight.

It appears that I keep losing connection when I run the script. I even attempted running a synchronous version of it and get the same error. I was thinking that the remote server was blocking my requests, however, I am able to run my original program and go to each of the urls from the same IP address without issue.

What would cause this exception and how do you fix it?

import httpx
import asyncio

async def get_response(client, url):
    resp = await client.get(url, headers=random_user_agent()) # Gets a random user agent.
    html = resp.text
    return html


async def main():
    async with httpx.AsyncClient() as client:
        tasks = []

        # Get list of urls to parse.
        urls = get_events('https://main-url-to-parse.com')
        
        # Get the responses for the detail page for each event
        for url in urls:
            tasks.append(asyncio.ensure_future(get_response(client, url)))
            
        detail_responses = await asyncio.gather(*tasks)

        for resp in detail_responses:
            event = get_details(resp) # Parse url and get desired info
        
asyncio.run(main())

Upvotes: 7

Views: 12823

Answers (1)

Basalex
Basalex

Reputation: 1187

I've had the same issue. The problem occurs when there is an exception in one of the asyncio.gather tasks. When it's raised, it causes httpx.Client to call __ aexit __ and cancel all the current requests. You could bypass this by using return_exceptions=True as an argument for asyncio.gather.

async def main():
    async with httpx.AsyncClient() as client:
        tasks = []

        # Get list of urls to parse.
        urls = get_events('https://main-url-to-parse.com')
    
       # Get the responses for the detail page for each event
       for url in urls:
            tasks.append(asyncio.ensure_future(get_response(client, url)))
        
       detail_responses = await asyncio.gather(*tasks, return_exceptions=True)

       for resp in detail_responses:
         # here you would need to do smth with the exceptions 
         # if isinstance(resp, Exception): ...
         event = get_details(resp) # Parse url and get desired info

Upvotes: 9

Related Questions