coverflower
coverflower

Reputation: 91

Taking screenshot of whole page with python selenium and Firefox or Chrome headless

This post is related to this one:

Python selenium screen capture not getting whole page

The solution with PhantomsJS seems to be working:

driver = webdriver.PhantomJS()    
driver.maximize_window()
driver.get('http://www.angelfire.com/super/badwebs/')  
scheight = .1
while scheight < 9.9:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight/%s);" % scheight)
    scheight += .01        
driver.save_screenshot('angelfire_phantomjs.png')

However the solution is from 2014 and PhantomJS is meanwhile deprecated. I'm getting namely this error message:

...
UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '

If I try to adapt to e.g. Firefox headless like this:

from selenium import webdriver

firefox_options = webdriver.FirefoxOptions()
firefox_options.set_headless() 
firefox_driver = webdriver.Firefox(firefox_options=firefox_options)

firefox_driver.get('http://www.angelfire.com/super/badwebs/')  
scheight = .1
while scheight < 9.9:
    firefox_driver.execute_script("window.scrollTo(0, document.body.scrollHeight/%s);" % scheight)
    scheight += .01        
firefox_driver.save_screenshot('angelfire_firefox.png')

a screenshot is made but not of the whole page.

Any ideas how to make it work with Firefox or Chrome headless?

(P.S. I also found this post:

Taking Screenshot of Full Page with Selenium Python (chromedriver))

but it doesn't seem to be a general solution and it is much more complicated.)

Upvotes: 9

Views: 17023

Answers (1)

vc2279
vc2279

Reputation: 219

This is the method i came up with that takes a perfect screenshot of website with any length. It takes advantage of the fact that headless browser can set the window to any size before it runs, the challenge is to get the scroll height before running headless browsers. This is the only draw back, running the site twice.

from selenium import webdriver
from PIL import Image
from selenium.webdriver.chrome.options import Options
import time

url = 'any website url'

#run first time to get scrollHeight
driver = webdriver.Chrome()
driver.get(url)
#pause 3 second to let page load
time.sleep(3)
#get scroll Height
height = driver.execute_script("return Math.max( document.body.scrollHeight, document.body.offsetHeight, document.documentElement.clientHeight, document.documentElement.scrollHeight, document.documentElement.offsetHeight )")
print(height)
#close browser
driver.close()

#Open another headless browser with height extracted above
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument(f"--window-size=1920,{height}")
chrome_options.add_argument("--hide-scrollbars")
driver = webdriver.Chrome(options=chrome_options)

driver.get(url)
#pause 3 second to let page loads
time.sleep(3)
#save screenshot
driver.save_screenshot('screen_shot.png')
driver.close()

Upvotes: 21

Related Questions