gweilo8
gweilo8

Reputation: 53

Tick a checkbox using Selenium webdriver in Python

Fellows,

I'm doing some webscraping and need to download multiple PDFs from the www1.hkexnews.hk website.

However, I encountered a problem while trying to make my Selenium chromedriver tick the box that appears every time one wants to download a PDF on the said website. The code executes, but the box still appears unclicked.

Please refer to my source code below - would appreciate any advice!

driver = webdriver.Chrome('/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/chromedriver',options=chrome_options)
driver.implicitly_wait(10)
driver.maximize_window()

start_address = "https://www1.hkexnews.hk/app/appyearlyindex.html?lang=en&board=mainBoard&year=2021"

driver.get(start_address)
PDF_link = driver.find_element_by_xpath("//a[contains(text(),'Full Version')]")
print("Now clicking...'", PDF_link.text,"'")
PDF_link.click()

checkbox = driver.find_element_by_id('warning-statement-accept')
print("Now clicking...", checkbox.text)
checkbox.click

Edit: Thank you guys! The downloading works fine now, just one small follow-up question - how can I modify the downloading code to save each PDF according to its company name - available through all_names = driver.find_elements_by_xpath("//div[@class='applicant-name']")?

At the moment, I am using the automatic download options as per below, I guess the downloading logic would have to be adjusted (I would rather download the PDFs with correct names already, rather than employ the dirty workaround of using Python to change their names once they're saved...)

chrome_options.add_experimental_option('prefs', {
"download.default_directory": "/Users/XXX/Downloads", #Change default directory for downloads
"download.prompt_for_download": False, #To auto download the file
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True #It will not show PDF directly in chrome
})

Upvotes: 2

Views: 479

Answers (2)

robots.txt
robots.txt

Reputation: 137

This should do it:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

link = "https://www1.hkexnews.hk/app/appyearlyindex.html?lang=en&board=mainBoard&year=2021"

driver = webdriver.Chrome()
wait = WebDriverWait(driver,10)

driver.get(link)
elem = wait.until(EC.presence_of_element_located((By.XPATH,"//tr[@class='record-ap-phip']//a[contains(.,'Full Version')]")))
elem.click()
wait.until(EC.presence_of_element_located((By.XPATH,"//*[@id='warning-statement-dialog']//label[@for='warning-statement-accept']"))).click()
wait.until(EC.presence_of_element_located((By.XPATH,"//*[@id='warning-statement-dialog']//a[contains(@class,'btn-ok')]"))).click()

Here goes the modified version of the script which will kick out the newly opened tabs. I didn't include the downloading logic within the script. I suppose you can do that yourself.

driver.get(link)
current = driver.current_window_handle
for elem in wait.until(EC.presence_of_all_elements_located((By.XPATH,"//tr[@class='record-ap-phip']//a[contains(.,'Full Version')]"))):
    elem.click()
    wait.until(EC.presence_of_element_located((By.XPATH,"//*[@id='warning-statement-dialog']//label[@for='warning-statement-accept']"))).click()
    wait.until(EC.presence_of_element_located((By.XPATH,"//*[@id='warning-statement-dialog']//a[contains(@class,'btn-ok')]"))).click()
    wait.until(EC.new_window_is_opened)
    driver.switch_to.window([window for window in driver.window_handles if window != current][0])
    print(driver.current_url)
    driver.close()
    driver.switch_to.window(current)

driver.quit()

Upvotes: 1

Prophet
Prophet

Reputation: 33361

There are several issues here:

  1. "checkbox" locator is wrong.
  2. Your current code will download the first PDF file only.
    It is preferably to use expected conditions explicit waits instead of implicit wait.
    This should work better:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome('/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/chromedriver',options=chrome_options)
wait = WebDriverWait(driver, 20)

driver.maximize_window()

start_address = "https://www1.hkexnews.hk/app/appyearlyindex.html?lang=en&board=mainBoard&year=2021"

driver.get(start_address)
PDF_link = wait.until(EC.visibility_of_element_located((By.XPATH, "//a[contains(text(),'Full Version')]")))

print("Now clicking...'", PDF_link.text,"'")
PDF_link.click()

checkbox = wait.until(EC.visibility_of_element_located((By.XPATH, "//div[./label[@for='warning-statement-accept']]//input")))
print("Now clicking...", checkbox.text)
checkbox.click

Upvotes: 0

Related Questions