Python aiohttp (with asyncio) sends requests very slowly

Question

Situation: I am trying to send a HTTP request to all listed domains in a specific file I already downloaded and get the destination URL, I was forwarded to.

Problem: Well I have followed a tutorial and I get many less responses than expected. It's around 100 responses per second, but in the tutorial there are 100,000 responses per minute listed. The script gets also slower and slower after a couple of seconds, so that I just get 1 response every 5 seconds.

Already tried: Firstly I thought that this problem is because I ran that on a Windows server. Well after I tried the script on my computer, I recognized that it was just a little bit faster, but not much more. On an other Linux server it was the same like on my computer (Unix, macOS).

Code: https://pastebin.com/WjLegw7K

work_dir = os.path.dirname(__file__)

async def fetch(url, session):
    try:
        async with session.get(url, ssl=False) as response:
            if response.status == 200:
                delay = response.headers.get("DELAY")
                date = response.headers.get("DATE")
                print("{}:{} with delay {}".format(date, response.url, delay))
                return await response.read()
    except Exception:
        pass

async def bound_fetch(sem, url, session):
    # Getter function with semaphore.
    async with sem:
        await fetch(url, session)


async def run():
    os.chdir(work_dir)
    for file in glob.glob("cdx-*"):
        print("Opening: " + file)
        opened_file = file
        tasks = []
        # create instance of Semaphore
        sem = asyncio.Semaphore(40000)
        with open(work_dir + '/' + file) as infile:
            seen = set()
            async with ClientSession() as session:
                for line in infile:
                    regex = re.compile(r'://(.*?)/')
                    domain = regex.search(line).group(1)
                    domain = domain.lower()

                    if domain not in seen:
                        seen.add(domain)

                        task = asyncio.ensure_future(bound_fetch(sem, 'http://' + domain, session))
                        tasks.append(task)

                    del line
                responses = asyncio.gather(*tasks)
                await responses
            infile.close()
            del seen
            del file


loop = asyncio.get_event_loop()

future = asyncio.ensure_future(run())
loop.run_until_complete(future)

I really don't know how to fix that issue. Especially because I'm very new to Python... but I have to get it to work somehow :(

Python aiohttp (with asyncio) sends requests very slowly

Answers (1)

Related Questions