Reputation: 674
I have created the function below to scrape results from the website, I am wondering how to:
Function:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium import webdriver
from functools import reduce
def stats_canada():
driver = webdriver.Chrome('/Users/wwds/Desktop/chromedriver')
driver.get('https://www150.statcan.gc.ca/n1/en/type/data?count=100&p=-All%2C5-data/tables#all')
elements = WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "#all a[target='_self']")))
linkTitles = pd.DataFrame([title.text for title in elements]).rename(columns = {0 : 'Name'})
links = pd.DataFrame([link.get_attribute("href") for link in elements]).rename(columns = {0 : 'Link'})
elements = WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "#all span[class='ndm-result-date']")))
release_date = pd.DataFrame([date.text for date in elements]).rename(columns = {'0' : 'Release Date'}).rename(columns = {0 : 'Release Date'})
elements = WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "#all div[class='ndm-result-productid']")))
table_id = pd.DataFrame([table.text for table in elements]).rename(columns = {0 : 'Table ID'})
table_id['Table ID'] = table_id['Table ID'].str.replace("Table: ", "")
data = reduce(lambda x,y: pd.merge(x, y, left_index = True, right_index = True), [linkTitles, links, release_date, table_id])
return data
stats_canada()
Thanks in advance!
Upvotes: 1
Views: 335
Reputation: 1644
Firstly you have the id for "Tables (8,899)" tab and you have to click on it. For this you can use the fowling-
elem = driver.find_element_by_id('tables-lnk')
elem.click()
time.sleep(10) #this delay is for loading the page
Now you have to scrape every entry from this page using selenium or beautiful soup whatever you are familiar with and add them to your dataframe.
Then you have to click the next button below the page. you can find the button id and click the button on the above way.
Upvotes: 2