CFRJ
CFRJ

Reputation: 157

set up asyncio with getRequests

I'm trying to get asyncio working with my webscraper. I've gotten it working before but when I run it today the set_query does not return a sting

async def set_query(company):


    with ThreadPoolExecutor(10) as executor:
        loop = asyncio.get_event_loop()

        #Create Query
        query = format_text(company)
        page = get(query)
        soup = BeautifulSoup(page.content,  'html.parser')
        #Get adress etc as string
        try:
            location = soup.find_all('address')[0].text
        except:
            location = soup.find_all('p')[0].text
        if "Din sökning på" in location or "Ingen träff på" in location:
            return

    return  location

def scrape():
    #Companies to scrape
    companies = getData()
    #Get Page
    count = 0
    for company in companies:
        try:
            location = set_query(company)
            print(location)
        except:
            print("")

        corp.update({company:get_adress(location)})
        save_to_excel()

def start_download():
    loop = asyncio.get_event_loop()

    future = asyncio.ensure_future(scrape())
    loop.run_until_complete(future)

The excpected result is to pass the location returned from set_query into get_adress but it's not passing in a string, error message from get_adress "This is exception 'coroutine' object has no attribute 'replace'"

Upvotes: 0

Views: 43

Answers (1)

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

Your code tries to call asynchronous coroutine set_query(company) in synchronous way.
Coroutine is awaitable object.

Make your def scrape() to be a coroutine that will awaits a result from another coroutine set_query(company):

...

async def scrape():
    ...
    location = await set_query(company)

Moreover, asyncio.ensure_future() requires Future/Task instance as a main argument: https://docs.python.org/3/library/asyncio-future.html#asyncio.ensure_future

Upvotes: 1

Related Questions