Reputation: 73
I'm attempting to download files from a page that's constructed almost entirely in JS. Here's the setup of the situation and what I've managed to accomplish.
The page itself takes upward of 5 minutes to load. Once loaded, there are 45,135 links (JS buttons). I need a subset of 377 of those. Then, one at a time (or using ASYNC), click those buttons to initiate the download, rename the download, and save it to a place that will keep the download even after the code has completed.
Here's the code I have and what it manages to do:
import asyncio
from playwright.async_api import async_playwright
from pathlib import Path
path = Path().home() / 'Downloads'
timeout = 300000 # 5 minute timeout
async def main():
async with async_playwright() as p:
browser = await p.chromium.launch()
context = await browser.new_context()
page = await context.new_page()
await page.goto("https://my-fun-page.com", timeout=timeout)
await page.wait_for_selector('ul.ant-list-items', timeout=timeout) # completely load the page
downloads = page.locator("button", has=page.locator("span", has_text="_Insurer_")) # this is the list of 377 buttons I care about
# texts = await downloads.all_text_contents() # just making sure I got what I thought I got
count = await downloads.count() # count = 377.
# Starting here is where I can't follow the API
for i in range(count):
print(f"Starting download {i}")
await downloads.nth(i).click(timeout=0)
page.on("download", lambda download: download.save_as(path / download.suggested_filename))
print("\tDownload acquired...")
await browser.close()
asyncio.run(main())
UPDATE: 2022/07/15 15:45 CST - Updated code above to reflect something that's closer to working than previously but still not doing what I'm asking.
The code above is actually iterating over the locator object elements and performing the downloads. However, the page.on("download")
step isn't working. The files are not showing up in my Downloads folder after the task is completed. Thoughts on where I'm missing the mark?
Python 3.10.5 Current public version of playwright
Upvotes: 0
Views: 1599
Reputation: 3757
download.save_as
returns a coroutine which you need
to await. Since there is no such thing as an "aysnc lambda
", and
that coroutines can only be awaited inside async functions, you
cannot use lambda
here. You need to create a separate async function,
and await download.save_as
.page.on
. After registering it once, the callable will be called automatically for all "download" events.page.on
before the download actually happens (or before the event fires, in general). It's often best to shift these calls right after you create the page using .new_page()
.A Better Solution
These were the obvious mistakes in your approach, fixing them should make it work. However, since you know exactly when the downloads are going to take place (after you click downloads.nth(i)
), I would suggest using expect_download instead. This will make sure that the file is completely downloaded before your main program continues (callables registered with events using page.on
are not awaited). Your code will somewhat become like this:
for i in range(count):
print(f"Starting download {i}")
async with page.expect_download() as download_handler:
await downloads.nth(i).click(timeout=0)
download = await download_handler.value
await download.save_as(path + f'\\{download.suggested_filename}')
print("\tDownload acquired...")
Upvotes: 2