Anna Ignashkina
Anna Ignashkina

Reputation: 497

Get page source after each button click in a loop with get_elements_by_xpath using Selenium

I would like to automatically download text files for ATS Blocks Download section on FINRA website. The problem is while I am able to click on the icon and open the file in the browser, I cannot get the page source after the click. driver.page_source returns the page source for the ATS Blocks Download section page (the one before the click).

Here is a piece of code I was trying out:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import time 


driver = webdriver.Chrome(ChromeDriverManager().install())
URL = 'https://otctransparency.finra.org/otctransparency/'
driver.get(URL)

# Agree to the general terms
driver.find_element_by_xpath('//*[@class="btn btn-warning"]').click()

#go to ATS Blocks Download section
driver.find_element_by_xpath('//*[@href="/otctransparency/AtsBlocksDownload"]').click()

#wait for the page to fully load
time.sleep(5)

#click on each download icon
for element in driver.find_elements_by_xpath('//*[@src="./assets/icon_download.png"]'):
    element.click()
    print(driver.page_source)

How to get the page source after every element.click()?

Upvotes: 0

Views: 847

Answers (3)

natn2323
natn2323

Reputation: 2061

Be sure not to mix various "waiting" mechanisms, as it can result in unexpected behavior (See this StackOverflow post for "why").

Be careful when using setting an implicit wait time, since once it is set, it is set for the lifetime of the driver instance (source, although it has been said in various places across the web).

If you intend to have your driver wait on multiple pages, you should use WebDriverWait. As shown in other replies, WebDriverWait(driver, timeout) accepts a WebDriver instance as well as an integer which represents the amount of time to wait before throwing an TimeoutException, in other words it accepts a timeout.

You can create a new WebDriverWait instance every time you're trying to find an element, without having to create a new WebDriver instance with a new implicit wait time. Since each element may need to waited on for a differing duration, this is ideal. You could go as far as to create a wrapper function to encapsulate the use of WebDriverWait:

def PatientlyClick(by, path, driver, timeout):
    WebDriverWait(driver,timeout).until(EC.element_to_be_clickable((by, path))).click()

The above snippet of code could be made prettier if you designed a class which encapsulated your WebDriver instance, but that might be unnecessary for your purposes (see Page Object Model Design Pattern).

Upvotes: 0

KunduK
KunduK

Reputation: 33384

To get page_source of all the pages. You need to Induce WebDriverWait and element_to_be_clickable() Induce WebDriverWait and visibility_of_all_elements_located() Induce WebDriverWait and number_of_windows_to_be()

Code:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

driver = webdriver.Chrome()
URL = 'https://otctransparency.finra.org/otctransparency/'
driver.get(URL)
driver.maximize_window()
# Agree to the general terms
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//*[@class="btn btn-warning"]'))).click()

#go to ATS Blocks Download section
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//a[@href="/otctransparency/AtsBlocksDownload"]'))).click()

#click on each download icon
elements=WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.XPATH,'//img[@src="./assets/icon_download.png"]')))

for link in range(len(elements)):
    elements = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, '//img[@src="./assets/icon_download.png"]')))
    elements[link].click()
    WebDriverWait(driver,10).until(EC.number_of_windows_to_be(2))
    windowhandles=driver.window_handles
    driver.switch_to.window(windowhandles[-1])
    WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.TAG_NAME,"pre")))
    print(driver.page_source)
    driver.close()
    driver.switch_to.window(windowhandles[0])

Upvotes: 1

Michael Krezalek
Michael Krezalek

Reputation: 114

After every time that you click element to download other browser tab is opened, in order to get the page source from the other tab use:

for element in driver.find_elements_by_xpath('//[@src="./assets/icon_download.png"]'):
element.click()
driver.switch_to.window(driver.window_handles[1])
driver.set_page_load_timeout(120)
print(driver.page_source)
driver.switch_to.window(driver.window_handles[0])
driver.set_page_load_timeout(120)

PS. Instead of doing the:

time.sleep(5)

You can do:

driver.set_page_load_timeout(120)

Upvotes: 0

Related Questions