anarchy
anarchy

Reputation: 5174

Is there a way to make this python selenium code work in headless mode?

So I've asked this question earlier (Unable to get selenium (python) to download a csv file which doesnt have a link but only appears after i click the download button) and managed to get this far. I finally realised that the code wasn't working because it was in headless mode.

In my earlier post I also mentioned that I'd try to use requests to get the file but there doesn’t seem to be a link for the csv file in this case.

The code basically goes here https://www.macrotrends.net/1476/copper-prices-historical-chart-data, clicks the All Years button, then clicks the Download Historical Data button. and selenium tries to save the file after it clicks.

But like i said it only downloads the file when i'm in normal mode it doesn't seem to work in headless. Is there a reason for this? Is there a way to make it work in headless mode? I've been looking around but i cant find an answer.


from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options

start_time = time.time()

options = Options()

#options.add_argument("--headless")
options.add_argument("--disable-gpu")
options.add_argument("--disable-extensions")

options.add_experimental_option("prefs", {
  "download.default_directory": r"'/home/Documents/testing/macrotrends'",
  "download.prompt_for_download": False,
  "download.directory_upgrade": True,
  "safebrowsing.enabled": False
})

driver = webdriver.Chrome(executable_path=r'/home/chromedriver/chromedriver',options=options)


driver.get('https://www.macrotrends.net/1476/copper-prices-historical-chart-data')

time.sleep(5)
iframe = driver.find_element_by_xpath("//iframe[@id='chart_iframe']")
driver.switch_to.frame(iframe)
xpath = "//a[text()='All Years']"
driver.find_element_by_xpath(xpath).click()
xpath = "//button[@id='dataDownload']"
driver.find_element_by_xpath(xpath).click()
time.sleep(10)

driver.close()

print("--- %s seconds ---" % (time.time() - start_time))

screenshot of the website in chrome

Upvotes: 1

Views: 1316

Answers (2)

furas
furas

Reputation: 142631

You can use module pyvirtualdisplay to create virtual display which will be used automatically by Chrome or Firefox (without headless) and it will hide window.

Chrome:

from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
import time

from pyvirtualdisplay import Display

display = Display(visible=0, size=(1920,1080))
display.start()

start_time = time.time()

options = Options()

###options.add_argument("--headless")
options.add_argument("--disable-gpu")
options.add_argument("--disable-extensions")

options.add_experimental_option("prefs", {
  "download.default_directory": "/home/Documents/testing/macrotrends", # without `r` and `' '`, only `" "`
  "download.prompt_for_download": False,
  "download.directory_upgrade": True,
  "safebrowsing.enabled": False
})

driver = webdriver.Chrome(executable_path=r'/home/chromedriver/chromedriver',options=options)
#driver = webdriver.Chrome(options=options) # I have chromedriver's folder in PATH so I don't have to use `executable_path`

driver.get('https://www.macrotrends.net/1476/copper-prices-historical-chart-data')
print('[INFO] loaded', time.time() - start_time)
time.sleep(5)

iframe = driver.find_element_by_xpath("//iframe[@id='chart_iframe']")
driver.switch_to.frame(iframe)
print('[INFO] switched', time.time() - start_time)

xpath = "//a[text()='All Years']"
driver.find_element_by_xpath(xpath).click()
xpath = "//button[@id='dataDownload']"
driver.find_element_by_xpath(xpath).click()
print('[INFO] clicked', time.time() - start_time)
time.sleep(10)

print('[INFO] closing', time.time() - start_time)
driver.close()
display.stop()
print('[INFO] end', time.time() - start_time)

Firefox:

from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.firefox.options import Options
import time

from pyvirtualdisplay import Display

display = Display(visible=0, size=(1920,1080))
display.start()

start_time = time.time()

options = Options()

###options.add_argument("--headless")
options.add_argument("--disable-gpu")
options.add_argument("--disable-extensions")

options.set_preference("browser.download.folderList", 2)
options.set_preference("browser.download.dir", "/home/Documents/testing/macrotrends") # without `r` and `' '`, only `" "` 
options.set_preference("browser.download.useDownloadDir", True)
options.set_preference("browser.helperApps.neverAsk.saveToDisk", "text/csv")

driver = webdriver.Firefox(executable_path="...", options=options)
#driver = webdriver.Firefox(options=options) # I have geckondriver's folder in PATH so I don't have to use `executable_path`

driver.get('https://www.macrotrends.net/1476/copper-prices-historical-chart-data')
print('[INFO] loaded', time.time() - start_time)
time.sleep(5)

iframe = driver.find_element_by_xpath("//iframe[@id='chart_iframe']")
driver.switch_to.frame(iframe)
print('[INFO] switched', time.time() - start_time)

xpath = "//a[text()='All Years']"
driver.find_element_by_xpath(xpath).click()
xpath = "//button[@id='dataDownload']"
driver.find_element_by_xpath(xpath).click()
print('[INFO] clicked', time.time() - start_time)
time.sleep(10)

print('[INFO] closing', time.time() - start_time)
driver.close()
display.stop()

print('[INFO] end', time.time() - start_time)

Upvotes: 3

Corey Goldberg
Corey Goldberg

Reputation: 60604

Downloads are disabled by default when in headless mode. you can allow them by executing a developer tools command like this:

from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options

options = Options()
options.headless = True 
driver = Chrome(options=options)
params = {'behavior': 'allow', 'downloadPath': '/path/for/download'}
driver.execute_cdp_cmd('Page.setDownloadBehavior', params)
# downloads are now enabled for this driver instance

Upvotes: 2

Related Questions