Reputation: 602
I have a text file with over 20 million lines in the below format:
ABC123456|fname1 lname1|fname2 lname2
.
.
.
.
My task is to read the file line by line and send both the names to Google transliteration API and print the results on the terminal (linux). Below is my code:
import asyncio
import urllib.parse
from aiohttp import ClientSession
async def getResponse(url):
async with ClientSession() as session:
async with session.get(url) as response:
response = await response.read()
print(response)
loop = asyncio.get_event_loop()
tasks = []
# I'm using test server localhost, but you can use any url
url = "https://www.google.com/inputtools/request?{}"
for line in open('tg.txt'):
vdata = line.split("|")
if len(vdata) == 3:
names = vdata[1]+"_"+vdata[2]
tdata = {"text":names,"ime":"transliteration_en_te"}
qstring = urllib.parse.urlencode(tdata)
task = asyncio.ensure_future(getResponse(url.format(qstring)))
tasks.append(task)
loop.run_until_complete(asyncio.wait(tasks))
In the above code, my file tg.txt
contains 20+ million lines. When I run it, my laptop freezes and I have to hard restart it. But this code works fine when I use another file tg1.txt
which has only 10 lines. What am I missing?
Upvotes: 1
Views: 1436
Reputation: 1064
You can try to use asyncio.gather(*futures)
instead of asyncio.wait
.
Also try to do this with batches of fixed size (for example 10 lines per batch) and add print after each processed batch, it should help you to debug your app.
Also your future could finish in different order and it's better to store result of gather and print it when processing of batch is finished.
Upvotes: 1