Reputation: 2735
I am using python to create a "favorites" section of a website. Part of what I want to do is grab an image to put next to their link. So the process would be that the user puts in a URL and I go grab a screenshot of that page and display it next to the link. Easy enough?
I have currently downloaded pywebshot and it works great from my terminal on my local box. However, when I put it on the server, I get a Segmentation Fault with the following traceback:
/usr/lib/pymodules/python2.6/gtk-2.0/gtk/__init__.py:57: GtkWarning: could not open display
warnings.warn(str(e), _gtk.Warning)
./pywebshot.py:16: Warning: invalid (NULL) pointer instance
self.parent = gtk.Window(gtk.WINDOW_TOPLEVEL)
./pywebshot.py:16: Warning: g_signal_connect_data: assertion `G_TYPE_CHECK_INSTANCE (instance)' failed
self.parent = gtk.Window(gtk.WINDOW_TOPLEVEL)
./pywebshot.py:49: GtkWarning: Screen for GtkWindow not set; you must always set
a screen for a GtkWindow before using the window
self.parent.show_all()
./pywebshot.py:49: GtkWarning: gdk_screen_get_default_colormap: assertion `GDK_IS_SCREEN (screen)' failed
self.parent.show_all()
./pywebshot.py:49: GtkWarning: gdk_colormap_get_visual: assertion `GDK_IS_COLORMAP (colormap)' failed
self.parent.show_all()
./pywebshot.py:49: GtkWarning: gdk_screen_get_root_window: assertion `GDK_IS_SCREEN (screen)' failed
self.parent.show_all()
./pywebshot.py:49: GtkWarning: gdk_window_new: assertion `GDK_IS_WINDOW (parent)' failed
self.parent.show_all()
Segmentation fault
I know that some things can't run in a pts environment, but honestly that's a little beyond me right now. If I need to somehow pretend that my pts connection is tty, I can try it. But at this point I'm not even sure what's going on and I admit it's a bit over my head. Any help would be greatly appreciated.
Also, if there's a web service that I can pass a url and receive an image, that would work just as well. I am NOT married to the idea of pywebshot.
I do know that the server I'm on is running X and has all the necessary python modules installed.
Thanks in advance.
Upvotes: 2
Views: 13198
Reputation: 1633
This is the code I used to get the screenshot of the whole scrolled webpage:
from PIL import Image
from io import BytesIO
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import logging
import os
import time
# Set default download folder for ChromeDriver
videos_folder = r"./download"
if not os.path.exists(videos_folder):
os.makedirs(videos_folder)
prefs = {"download.default_directory": videos_folder}
def open_url(address):
# SELENIUM SETUP
logging.getLogger('WDM').setLevel(logging.WARNING) # just to hide not so rilevant webdriver-manager messages
chrome_options = Options()
chrome_options.headless = True
chrome_options.add_experimental_option("prefs", prefs)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
driver.implicitly_wait(1)
driver.maximize_window()
driver.get(address)
driver.set_window_size(1920, 1080) # to set the screenshot width
save_screenshot(driver, '{}/Screenshot.png'.format(videos_folder))
driver.quit()
def save_screenshot(driver, file_name):
height, width = scroll_down(driver)
driver.set_window_size(width, height)
img_binary = driver.get_screenshot_as_png()
img = Image.open(BytesIO(img_binary))
img.save(file_name)
# print(file_name)
print("Screenshot saved!")
def scroll_down(driver):
total_width = driver.execute_script("return document.body.offsetWidth")
total_height = driver.execute_script("return document.body.parentNode.scrollHeight")
viewport_width = driver.execute_script("return document.body.clientWidth")
viewport_height = driver.execute_script("return window.innerHeight")
rectangles = []
i = 0
while i < total_height:
ii = 0
top_height = i + viewport_height
if top_height > total_height:
top_height = total_height
while ii < total_width:
top_width = ii + viewport_width
if top_width > total_width:
top_width = total_width
rectangles.append((ii, i, top_width, top_height))
ii = ii + viewport_width
i = i + viewport_height
previous = None
part = 0
for rectangle in rectangles:
if not previous is None:
driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1]))
time.sleep(0.5)
# time.sleep(0.2)
if rectangle[1] + viewport_height > total_height:
offset = (rectangle[0], total_height - viewport_height)
else:
offset = (rectangle[0], rectangle[1])
previous = rectangle
return total_height, total_width
open_url("https://stackoverflow.com/questions/4091940/how-to-save-web-page-as-image-using-python")
Here the screenshot obtained:
The current stable release of ChromeDriver is 114.0.5735.90
, which is not compatible with the current version (as of 2024.06.04) of Chrome (125.0.6422.141
), so the script, as above, would not work.
To fix this, at the moment, the change to be made is unfortunately manual, by downloading the ChromeDriver version (relative to the current stable version of Chrome) from here, as shown in the image below (for Chrome 125.0.6422.141
):
Once the chromedriver-linux64.zip
archive has been saved, the extracted folder must be renamed with the relevant version of Chrome (125.0.6422.141
) and then moved to the path ~/.wdm/drivers/chromedriver/linux64/
(obtaining ~/.wdm/drivers/chromedriver/linux64/125.0.6422.141/chromedriver
), and therefore the script must be modified by replacing driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
with driver = webdriver.Chrome(executable_path=r"~/.wdm/drivers/chromedriver/linux64/125.0.6422.141/chromedriver", options=chrome_options)
.
That's all!
Upvotes: 2
Reputation: 11
from selenium import webdriver
from xvfbwrapper import Xvfb
d=Xvfb(width=400,height=400)
d.start()
browser=webdriver.Firefox()
url="http://stackoverflow.com/questions/4091940/how-to-save-web-page-as-image-using-python"
browser.get(url)
destination="screenshot_filename.jpg"
if browser.save_screenshot(destination):
print "File saved in the destination filename"
browser.quit()
Upvotes: 1
Reputation: 2735
I found websnapr.com which is a web service that will give you the image with just a little bit of work.
import subprocess
subprocess.Popen(['wget', '-O', MYFILENAME+'.png', 'http://images.websnapr.com/?url='+MYURL+'&size=s&nocache=82']).wait()
Easy as pie.
Upvotes: 0
Reputation: 77399
Let me guess, the server does not have an X server, right?
You may have to run a headless X server to get this working.
Upvotes: 0