paskh
paskh

Reputation: 35

How to use asyncio and aiohttp for looping instead of for looping?

My code is working in this way but it's speed is very slow because of for loops, can you help me, to make it work with aiohttp, asyncio?

def field_info(field_link):
    response = requests.get(field_link)
    soup = BeautifulSoup(response.text, 'html.parser')
    races = soup.findAll('header', {'class': 'dc-field-header'})
    tables = soup.findAll('table', {'class': 'dc-field-comp'})

    for i in range(len(races)):
        race_name = races[i].find('h3').text
        race_time = races[i].find('time').text

        names = tables[i].findAll('span', {'class': 'title'})
        trainers = tables[i].findAll('span', {'class': 'trainer'})
        table = []

        for j in range(len(names)):
            table.append({
                'Name': names[j].text,
                'Trainer': trainers[j].text,
            })

        return {
                'RaceName': race_name,
                'RaceTime': race_time,
                'Table': table
                }


links = [link1, link2, link3]
for link in links:
    scraped_info += field_info(link)

Upvotes: 2

Views: 900

Answers (1)

Mikhail Gerasimov
Mikhail Gerasimov

Reputation: 39536

1) Create a coroutine to make request asynchronously:

import asyncio
import aiohttp


async def get_text(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as resp:
            return await resp.text()

2) Replace all synchronious requests with awaiting for this coroutine, making outer functions coroutines also:

async def field_info(field_link):              # async - to make outer function coroutine
    text = await get_text(field_link)          # await - to get result from async funcion
    soup = BeautifulSoup(text, 'html.parser')

3) Make outer code to do jobs concurrently using asyncio.gather():

async def main():
    links = [link1, link2, link3]

    scraped_info = asyncio.gather(*[
        field_info(link)
        for link
        in links
    ])  # do multiple field_info coroutines concurrently (parallely)

4) Pass top-level coroutine to asyncio.run():

asyncio.run(main())

Upvotes: 3

Related Questions