How to select tab and scrape results for all pages with Selenium?

Question

I have created the function below to scrape results from the website, I am wondering how to:

First click on the "Tables (8,899)" tab and only scrape results from there.
Right now it only scrapes the first page, how would I go about scraping all the pages and appending them into one dataframe without having to specify the number of pages?

Function:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium import webdriver
from functools import reduce

def stats_canada():
     driver = webdriver.Chrome('/Users/wwds/Desktop/chromedriver')
     driver.get('https://www150.statcan.gc.ca/n1/en/type/data?count=100&p=-All%2C5-data/tables#all')
     elements = WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "#all a[target='_self']")))
     linkTitles = pd.DataFrame([title.text for title in elements]).rename(columns = {0 : 'Name'})
     links = pd.DataFrame([link.get_attribute("href") for link in elements]).rename(columns = {0 : 'Link'})
     elements = WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "#all span[class='ndm-result-date']")))
     release_date = pd.DataFrame([date.text for date in elements]).rename(columns = {'0' : 'Release Date'}).rename(columns = {0 : 'Release Date'})
     elements = WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "#all div[class='ndm-result-productid']")))
     table_id = pd.DataFrame([table.text for table in elements]).rename(columns = {0 : 'Table ID'})
     table_id['Table ID'] = table_id['Table ID'].str.replace("Table: ", "")
     data = reduce(lambda x,y: pd.merge(x, y, left_index = True, right_index = True), [linkTitles, links, release_date, table_id])
     return data


stats_canada()

Thanks in advance!

How to select tab and scrape results for all pages with Selenium?

Answers (1)

Related Questions