Zach Johnson
Zach Johnson

Reputation: 2247

Extract images from playwright page without requesting them again?

Let's say I've requested a page and it's fully loaded. Is it possible to save the images from the rendered/loaded page without sending another request for the image? This would be to avoid just collecting the individual image urls and hammering the server for each image again.

Upvotes: 5

Views: 2708

Answers (2)

aikipooh
aikipooh

Reputation: 243

In case someone else comes here with the same problem:

Mine is a dynamic GIF, so a screenshot won't help much.

I therefore solve this by intercepting image requests made while loading the page like this (adapted from a blog post on the topic):

from playwright import sync_playwright

def on_response(response):
  if not response.ok:
    return  # if you don't want to ignore errors, add error handling here
  if response.request.resource_type == "image":
    with open(PATH_TO_STORE_IMAGE_AT, "wb") as f:
      f.write(response.body)

with sync_playwright() as p:
  browser = p.chromium.launch()  # or firefox or webkit
  page = browser.new_page()
  page.on("response", on_response)
  page.goto(URL_OF_PAGE_CONTAINING_IMAGE)

This could be fleshed out a bunch but should demonstrate the overall idea.

Upvotes: 3

yu li
yu li

Reputation: 1

transfer this to python code.

byte[] slideBg = page.locator("xpath=id(\"slideBgWrap\")").screenshot();
ByteArrayInputStream inStreambj = new ByteArrayInputStream(slideBg);
BufferedImage newImage = ImageIO.read(inStreambj);
ImageIO.write(newImage, "jpg", new File(url + "outputImage.jpg"));

Upvotes: -1

Related Questions