Naveed
Naveed

Reputation: 602

Computer hangs on large number of Python async aiohttp requests

I have a text file with over 20 million lines in the below format:

ABC123456|fname1 lname1|fname2 lname2
.
.
.
.

My task is to read the file line by line and send both the names to Google transliteration API and print the results on the terminal (linux). Below is my code:

import asyncio
import urllib.parse
from aiohttp import ClientSession

async def getResponse(url):
    async with ClientSession() as session:
        async with session.get(url) as response:
            response = await response.read()
            print(response)

loop = asyncio.get_event_loop()

tasks = []
# I'm using test server localhost, but you can use any url
url = "https://www.google.com/inputtools/request?{}"

for line in open('tg.txt'):
    vdata = line.split("|")

    if len(vdata) == 3:
        names = vdata[1]+"_"+vdata[2]
        tdata = {"text":names,"ime":"transliteration_en_te"}
        qstring = urllib.parse.urlencode(tdata)
        task = asyncio.ensure_future(getResponse(url.format(qstring)))
        tasks.append(task)

loop.run_until_complete(asyncio.wait(tasks))

In the above code, my file tg.txt contains 20+ million lines. When I run it, my laptop freezes and I have to hard restart it. But this code works fine when I use another file tg1.txt which has only 10 lines. What am I missing?

Upvotes: 1

Views: 1436

Answers (1)

Yurii Kramarenko
Yurii Kramarenko

Reputation: 1064

You can try to use asyncio.gather(*futures) instead of asyncio.wait. Also try to do this with batches of fixed size (for example 10 lines per batch) and add print after each processed batch, it should help you to debug your app. Also your future could finish in different order and it's better to store result of gather and print it when processing of batch is finished.

Upvotes: 1

Related Questions