A random coder
A random coder

Reputation: 473

Does anyone know how I can make this code move faster?

I have finished making a web scraper that will go through Roblox, and pick out all of the usernames of the first 1000 accounts made on Roblox. Fortunately it works! However, there is a downside.

My problem is that this code takes absolutely FOREVER to finish. Does anyone know a more efficient way to write the same thing, or is this just the base speed of Python Requests? Code is below :)

PS: The code took 5 minutes to go through only 600 accounts.

def find_account(id):
    import requests
    from bs4 import BeautifulSoup

    r = requests.request(url=f'https://web.roblox.com/users/{id}/profile', method='get')
    if r.status_code == 200:
        soup = BeautifulSoup(r.text, 'html.parser')
        stuff = soup.find_all('h2')
        special = stuff[0]
        special = list(special)
        special = special[0]
        return str(special) + '      ID: {}'.format(id)
    else:
        return None

users = []
for i in range(10000,11000):
    users.append(find_account(i))
    print(f'{i-9999} out of 1000 done')
 #There is more below this, but that is just the GUI and stuff. This is the part that gets the usernames.

Upvotes: 0

Views: 119

Answers (1)

Roman Czerwinski
Roman Czerwinski

Reputation: 559

Try the async library to asynchronously attempt to do the same thing. The advantage of using async python is that you do not need to wait for one http call to finish before calling the next. This is a fantastic article on how to write concurrent/parallel code in python, give it a read if the syntax here is confusing.

refactored to run in async mode:

import asyncio
import aiohttp
from bs4 import BeautifulSoup


async def find_account(id, session):
    async with session.get(f'https://web.roblox.com/users/{id}/profile') as r:
        if r.status == 200:
            response_text = await r.read()
            soup = BeautifulSoup(response_text, 'html.parser')
            stuff = soup.find_all('h2')
            special = stuff[0]
            special = list(special)
            special = special[0]
            print(f'{id-9999} out of 1000 done')
            return str(special) + '      ID: {}'.format(id)
        else:
            return None


async def crawl_url_id_range(min_id, max_id):
    tasks = []
    async with aiohttp.ClientSession() as session:
        for id in range(min_id, max_id):
            tasks.append(asyncio.ensure_future(find_account(id=id, session=session)))
        return await asyncio.gather(*tasks)


event_loop = asyncio.get_event_loop()
users = event_loop.run_until_complete(crawl_url_id_range(min_id=10000, max_id=11000))

I tested and the above code works fairly well.

Upvotes: 1

Related Questions