chidori
chidori

Reputation: 1112

how to debug async aiohttp slowness

I have been playing around with asyncio module recently. Below is the code I came up with for sending some parallel requests which seem to work fine on my laptop ( Mac OS ) but the same seems to run slow in another machine( Ubuntu 18.04). In the machine where it was running slowly ( Ubuntu 18.04 ) I installed virtualbox vm with again Ubuntu 18.04 in it and to my surprise the code seems to run perfectly fine there. I have multiple versions of python in Ubuntu machine and I am trying to run this with 3.7.2. I am not sure how to narrow down the issue here. It would be great if someone can help me on this.

I am sure that its not an network issue. In the Ubuntu physical machine this code takes ~130 seconds to get completed. But inside the Ubuntu VM where it is working as expected it just takes less than 5 seconds.

import aiohttp
import asyncio
import ssl
import time
from bs4 import BeautifulSoup


async def get_app_updated_date(html_content):
    soup = BeautifulSoup(html_content, 'lxml')
    section_titles_divs = [x for x in soup.select('div.hAyfc div.BgcNfc')]

    title_normalization = {
        'Updated': 'updated',
    }

    data = {
        'updated': None,
    }

    for title_div in section_titles_divs:
        section_title = title_div.string
        if section_title in title_normalization:
            title_key = title_normalization[section_title]
            value_div = title_div.next_sibling.select_one('span.htlgb')
            value = value_div.text
            data[title_key] = value
    return data


async def fetch(session, url, app_id):
    print(f'Fetching information for {app_id}')
    async with session.get(url, params={'id': app_id}, ssl=ssl.SSLContext()) as response:
        html_resp = await response.text()
        app_lastupdated_date = await get_app_updated_date(html_resp)
        return {app_id: app_lastupdated_date}


async def main():
    url = 'https://play.google.com/store/apps/details'
    app_list = ['com.google.android.youtube',
                'com.whatsapp',
                'com.instagram.android',
                'com.google.android.apps.maps',
                'com.kiloo.subwaysurf',
                'com.halfbrick.fruitninjafree',
                'com.adobe.reader',
                'org.mozilla.firefox',
                'com.zeptolab.ctr.ads',
                'com.fingersoft.hillclimb']
    async with aiohttp.ClientSession() as session:
        url_requests = [fetch(session, url, app_id) for app_id in app_list]
        print(url_requests)
        results = await asyncio.gather(*url_requests)
        for r in results:
            print(r)
        print(f'Result size  = {len(results)}')


if __name__ == '__main__':
    start_time = time.time()
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())
    print(f'Script execution completed in: {time.time() - start_time} seconds')

UPDATE: As advised, attaching my profiler report. Not able to understand the jargon. Again, seeking expertise of folks in this forum.

profiler_screenshot_1 profiler_screenshot_2

Upvotes: 3

Views: 853

Answers (1)

Slam
Slam

Reputation: 8572

I suggest you narrowing down to function, at least. Use profiling module or profiler in your IDE (i.e. Pycharm gives pretty good tools) to understand the problematic function.

But IMO, this actually does look like network issue, because running under Ubuntu on bare metal is:

  • python environment
  • event loop implementation
  • bindings to system packages
  • ubuntu networking (including DNS resolver)

Ubuntu in VM is:

  • python environment
  • event loop implementation
  • bindings to system packages
  • bridged network from VM to host system (depends on VM setting tho)
  • windows networking (including DNS resolver)

Upvotes: 2

Related Questions