Reputation: 157
I'm trying to get asyncio working with my webscraper. I've gotten it working before but when I run it today the set_query does not return a sting
async def set_query(company):
with ThreadPoolExecutor(10) as executor:
loop = asyncio.get_event_loop()
#Create Query
query = format_text(company)
page = get(query)
soup = BeautifulSoup(page.content, 'html.parser')
#Get adress etc as string
try:
location = soup.find_all('address')[0].text
except:
location = soup.find_all('p')[0].text
if "Din sökning på" in location or "Ingen träff på" in location:
return
return location
def scrape():
#Companies to scrape
companies = getData()
#Get Page
count = 0
for company in companies:
try:
location = set_query(company)
print(location)
except:
print("")
corp.update({company:get_adress(location)})
save_to_excel()
def start_download():
loop = asyncio.get_event_loop()
future = asyncio.ensure_future(scrape())
loop.run_until_complete(future)
The excpected result is to pass the location returned from set_query into get_adress but it's not passing in a string, error message from get_adress "This is exception 'coroutine' object has no attribute 'replace'"
Upvotes: 0
Views: 43
Reputation: 92854
Your code tries to call asynchronous coroutine set_query(company)
in synchronous way.
Coroutine is awaitable object.
Make your def scrape()
to be a coroutine that will awaits a result from another coroutine set_query(company)
:
...
async def scrape():
...
location = await set_query(company)
Moreover, asyncio.ensure_future()
requires Future/Task
instance as a main argument: https://docs.python.org/3/library/asyncio-future.html#asyncio.ensure_future
Upvotes: 1