Chris
Chris

Reputation: 2735

How to save web page as image using python

I am using python to create a "favorites" section of a website. Part of what I want to do is grab an image to put next to their link. So the process would be that the user puts in a URL and I go grab a screenshot of that page and display it next to the link. Easy enough?

I have currently downloaded pywebshot and it works great from my terminal on my local box. However, when I put it on the server, I get a Segmentation Fault with the following traceback:

/usr/lib/pymodules/python2.6/gtk-2.0/gtk/__init__.py:57: GtkWarning: could not open display
  warnings.warn(str(e), _gtk.Warning)
./pywebshot.py:16: Warning: invalid (NULL) pointer instance
  self.parent = gtk.Window(gtk.WINDOW_TOPLEVEL)
./pywebshot.py:16: Warning: g_signal_connect_data: assertion `G_TYPE_CHECK_INSTANCE (instance)' failed
  self.parent = gtk.Window(gtk.WINDOW_TOPLEVEL)
./pywebshot.py:49: GtkWarning: Screen for GtkWindow not set; you must always set
a screen for a GtkWindow before using the window
  self.parent.show_all()
./pywebshot.py:49: GtkWarning: gdk_screen_get_default_colormap: assertion `GDK_IS_SCREEN (screen)' failed
  self.parent.show_all()
./pywebshot.py:49: GtkWarning: gdk_colormap_get_visual: assertion `GDK_IS_COLORMAP (colormap)' failed
  self.parent.show_all()
./pywebshot.py:49: GtkWarning: gdk_screen_get_root_window: assertion `GDK_IS_SCREEN (screen)' failed
  self.parent.show_all()
./pywebshot.py:49: GtkWarning: gdk_window_new: assertion `GDK_IS_WINDOW (parent)' failed
  self.parent.show_all()
Segmentation fault

I know that some things can't run in a pts environment, but honestly that's a little beyond me right now. If I need to somehow pretend that my pts connection is tty, I can try it. But at this point I'm not even sure what's going on and I admit it's a bit over my head. Any help would be greatly appreciated.

Also, if there's a web service that I can pass a url and receive an image, that would work just as well. I am NOT married to the idea of pywebshot.

I do know that the server I'm on is running X and has all the necessary python modules installed.

Thanks in advance.

Upvotes: 2

Views: 13198

Answers (4)

Riccardo Volpe
Riccardo Volpe

Reputation: 1633

This is the code I used to get the screenshot of the whole scrolled webpage:

from PIL import Image
from io import BytesIO
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import logging
import os
import time

# Set default download folder for ChromeDriver
videos_folder = r"./download"
if not os.path.exists(videos_folder):
    os.makedirs(videos_folder)
prefs = {"download.default_directory": videos_folder}

def open_url(address):
    # SELENIUM SETUP
    logging.getLogger('WDM').setLevel(logging.WARNING)  # just to hide not so rilevant webdriver-manager messages
    chrome_options = Options()
    chrome_options.headless = True
    chrome_options.add_experimental_option("prefs", prefs)
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
    driver.implicitly_wait(1)
    driver.maximize_window()
    driver.get(address)
    driver.set_window_size(1920, 1080)  # to set the screenshot width
    save_screenshot(driver, '{}/Screenshot.png'.format(videos_folder))
    driver.quit()

def save_screenshot(driver, file_name):
    height, width = scroll_down(driver)
    driver.set_window_size(width, height)
    img_binary = driver.get_screenshot_as_png()
    img = Image.open(BytesIO(img_binary))
    img.save(file_name)
    # print(file_name)
    print("Screenshot saved!")

def scroll_down(driver):
    total_width = driver.execute_script("return document.body.offsetWidth")
    total_height = driver.execute_script("return document.body.parentNode.scrollHeight")
    viewport_width = driver.execute_script("return document.body.clientWidth")
    viewport_height = driver.execute_script("return window.innerHeight")

    rectangles = []

    i = 0
    while i < total_height:
        ii = 0
        top_height = i + viewport_height

        if top_height > total_height:
            top_height = total_height

        while ii < total_width:
            top_width = ii + viewport_width

            if top_width > total_width:
                top_width = total_width

            rectangles.append((ii, i, top_width, top_height))

            ii = ii + viewport_width

        i = i + viewport_height

    previous = None
    part = 0

    for rectangle in rectangles:
        if not previous is None:
            driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1]))
            time.sleep(0.5)
        # time.sleep(0.2)

        if rectangle[1] + viewport_height > total_height:
            offset = (rectangle[0], total_height - viewport_height)
        else:
            offset = (rectangle[0], rectangle[1])

        previous = rectangle

    return total_height, total_width

open_url("https://stackoverflow.com/questions/4091940/how-to-save-web-page-as-image-using-python")

Here the screenshot obtained:

Whole webpage screenshot

IMPORTANT UPDATE:

The current stable release of ChromeDriver is 114.0.5735.90, which is not compatible with the current version (as of 2024.06.04) of Chrome (125.0.6422.141), so the script, as above, would not work.

To fix this, at the moment, the change to be made is unfortunately manual, by downloading the ChromeDriver version (relative to the current stable version of Chrome) from here, as shown in the image below (for Chrome 125.0.6422.141):

enter image description here

Once the chromedriver-linux64.zip archive has been saved, the extracted folder must be renamed with the relevant version of Chrome (125.0.6422.141) and then moved to the path ~/.wdm/drivers/chromedriver/linux64/ (obtaining ~/.wdm/drivers/chromedriver/linux64/125.0.6422.141/chromedriver), and therefore the script must be modified by replacing driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options) with driver = webdriver.Chrome(executable_path=r"~/.wdm/drivers/chromedriver/linux64/125.0.6422.141/chromedriver", options=chrome_options).

That's all!

Upvotes: 2

Kavitha
Kavitha

Reputation: 11

from selenium import webdriver    
from xvfbwrapper import Xvfb
d=Xvfb(width=400,height=400)
d.start()
browser=webdriver.Firefox()
url="http://stackoverflow.com/questions/4091940/how-to-save-web-page-as-image-using-python"
browser.get(url)
destination="screenshot_filename.jpg"
if browser.save_screenshot(destination):
    print "File saved in the destination filename"
browser.quit()

Upvotes: 1

Chris
Chris

Reputation: 2735

I found websnapr.com which is a web service that will give you the image with just a little bit of work.

import subprocess
subprocess.Popen(['wget', '-O', MYFILENAME+'.png', 'http://images.websnapr.com/?url='+MYURL+'&size=s&nocache=82']).wait()

Easy as pie.

Upvotes: 0

Paulo Scardine
Paulo Scardine

Reputation: 77399

Let me guess, the server does not have an X server, right?

You may have to run a headless X server to get this working.

Upvotes: 0

Related Questions