Reputation: 83
I want to open multiple urls at once using Playwright for Python. But I am struggling to figure out how. This is from the async documentation:
async def main():
async with async_playwright() as p:
for browser_type in [p.chromium, p.firefox, p.webkit]:
browser = await browser_type.launch()
page = await browser.newPage()
await page.goto("https://scrapingant.com/")
await page.screenshot(path=f"scrapingant-{browser_type.name}.png")
await browser.close()
asyncio.get_event_loop().run_until_complete(main())
This opens each browser_type sequentially. How would I go about it if I wanted to do it in parallel? And how would I go about it if I wanted to do something similar with a list of urls?
I tried doing this:
urls = [
"https://scrapethissite.com/pages/ajax-javascript/#2015",
"https://scrapethissite.com/pages/ajax-javascript/#2014",
]
async def main(url):
async with async_playwright() as p:
browser = await p.chromium.launch(headless=False)
page = await browser.newPage()
await page.goto(url)
await browser.close()
async def go_to_url():
tasks = [main(url) for url in urls]
await asyncio.wait(tasks)
go_to_url()
But that gave me the following error:
92: RuntimeWarning: coroutine 'go_to_url' was never awaited
go_to_url()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Upvotes: 8
Views: 5532
Reputation: 114
I struggled to get the original code to work, even with @hardkoded 's answer. Using Python 3.11, I find the following code to work. I open each url in the same context, to open only 1 browser window.
import asyncio
from playwright.async_api import async_playwright
urls = [
"https://scrapethissite.com/pages/ajax-javascript/#2015",
"https://scrapethissite.com/pages/ajax-javascript/#2014",
"https://scrapethissite.com/pages/ajax-javascript/#2013",
]
async def get_detail(context, url):
page = await context.new_page()
await page.goto(url)
await page.wait_for_load_state(state="networkidle")
await page.wait_for_timeout(1000)
page.close
async def open_new_pages(context, urls):
# Creating tasks: https://docs.python.org/3.11/library/asyncio-task.html#creating-tasks
background_tasks = set()
for url in urls:
task = asyncio.create_task(
get_detail(context, url)
)
background_tasks.add(task)
# task.add_done_callback(background_tasks.discard)
# Above produced an error for me since the set then gets changed while the loop is running.
#Awaiting for each of the tasks:
for t in background_tasks:
await t
async def main(urls):
async with async_playwright() as p:
browser = await p.chromium.launch(headless=False)
context = await browser.new_context()
await open_new_pages(context, urls)
asyncio.run(main(urls))
I believe Taskgroups (https://docs.python.org/3.11/library/asyncio-task.html#task-groups) are more updated, especially in Python 3.11:
import asyncio
from playwright.async_api import async_playwright
urls = [
"https://scrapethissite.com/pages/ajax-javascript/#2015",
"https://scrapethissite.com/pages/ajax-javascript/#2014",
"https://scrapethissite.com/pages/ajax-javascript/#2013",
]
async def get_detail(context, url):
page = await context.new_page()
await page.goto(url)
await page.wait_for_load_state(state="networkidle")
await page.wait_for_timeout(1000)
page.close
async def open_new_pages(context, urls):
async with asyncio.TaskGroup() as tg:
for url in urls:
task = tg.create_task(
get_detail(context, url)
)
async def main(urls):
async with async_playwright() as p:
browser = await p.chromium.launch(headless=False)
context = await browser.new_context()
await open_new_pages(context, urls)
asyncio.run(main(urls))
Upvotes: 2
Reputation: 21607
I believe you need to call your go_to_url
function using the same recipe:
asyncio.get_event_loop().run_until_complete(go_to_url())
Upvotes: 1