Python web scraping/ data extraction

Question

For my master thesis, I am exploring the possibility to extract data from a website via web automation. The steps are as follows:

Sign in to the website ( https://www.metal.com/Copper/201102250376 )
Input username and password
Click sign-in
Change date to 01/01/2020
Scrape the table data generated and then save it to csv file
Save to a specific folder with a specific name on my PC
Run the same sequence to download additional historical price data for other materials in a new tab in the same browser window

I am stuck in steps 5, 6 and 7

from selenium import webdriver

DRIVER_PATH = 'C:\webdriver\chromedriver.exe' driver = webdriver.Chrome(executable_path=DRIVER_PATH, chrome_options=ChromeOptions)

driver.maximize_window()

driver.get('https://www.metal.com/Copper/201102250376')

#Login steps LoginClick1 = driver.find_element_by_css_selector( '#__next > div > div.smm-component-header-en > div.main > div.right > button.button.sign-in')

LoginClick1.click()

user_input = driver.find_element_by_id('user_name') user_input.send_keys('#####')

password_input = driver.find_element_by_id('password') password_input.send_keys('####')

Submit = driver.find_element_by_css_selector( 'body > div:nth-child(17) > div > div.ant-modal-wrap.ant-modal-centered.smm-component-sign-en > div > div.ant-modal-content > div > div > div > div.smm-component-sign-en-content > form > div:nth-child(3) > div > div > span > button')

Submit.click()

time.sleep(2)

#scroll down the point of interest in page driver.execute_script("window.scrollBy(0,1000)", "")

#change currency driver.find_element(By.XPATH,"//img[contains(@class,'icon___BUqam')]").click()

time.sleep(1)

#change date from datepicker

date_input = driver.find_element_by_xpath( '//*[@id="__next"]/div/div[5]/div1/div[7]/div1/div2/div1/span1/div/i')

date_input.click()

action = ActionChains(driver)

action.move_to_element(date_input).send_keys(Keys.BACKSPACE).send_keys( Keys.BACKSPACE).send_keys(Keys.BACKSPACE).send_keys(Keys.BACKSPACE).send_keys(Keys.BACKSPACE).send_keys(Keys.BACKSPACE).send_keys(Keys.BACKSPACE).send_keys(Keys.BACKSPACE).send_keys(Keys.BACKSPACE).send_keys(Keys.BACKSPACE).perform()

action.move_to_element(date_input).send_keys("01/01/2020").perform() action.move_to_element(date_input).send_keys(Keys.ENTER).perform()

time.sleep(2)

I am stuck trying to scrape the data from the table generated and then save into a csv file using selenium. See HTML code below table generated

**May 27, 2022** **10,758.75-10,788.43** **10,773.59** **+97.94** **USD/mt**

Any help would be massively appreciated.

Download file using button press Download button

driver.find_element(By.XPATH,"//img[contains(@src,'https://static.metal.com/www.metal.com/4.1.161/static/images/price/download.png')]").click()

time.sleep(1)

driver.find_element(By.XPATH,"//img[contains(@src,'https://static.metal.com/www.metal.com/4.1.161/static/images/price/download_excel.png')]").click()

To save time since I have multiple files/data to download, I am also exploring the possibility of directly saving the file via the download button provided.

The problem I encounter is that I am not able to directly specify the filename I want it to be saved as.
Upon click, the download button opens a new tab and then closes within seconds to initialize the file download.
The file is then downloaded with a materialcode-today's date file naming format.

Have you any idea on how to go about this?

Darshan Shah · Accepted Answer

The reason sign in button is not getting clicked is because the xpath //*[@id="__next"]/div/div[3]/div[2]/div[2]/button[2] is incorrect the id of next is the main container div through which we are naviagting to the sign button by providing remaining html nodre structure

Instead you can directly select the sign in button as //button[@class='button sign-in'] based on its class value

Your solution for sign in would look like

driver = webdriver.Chrome(executable_path='C:\webdrivers\chromedriver.exe')
driver.maximize_window()
driver.get('https://www.metal.com/Nickel/201102250239')
# Click on Sign In
driver.find_element(By.XPATH, "//button[@class='button sign-in']").click()
# Enter username
driver.find_element(By.ID, "user_name").send_keys("your username")
# Enter password
driver.find_element(By.ID, "password").send_keys("your password") 
# Click Sign In
driver.find_element(By.XPATH, "//button[@type='submit']").click()

To scrape data

for element in driver.find_elements_by_class_name("historyBodyRow___1Bk9u"):
 elements =element.find_elements_by_tag_name("div")
 print("Date="+ elements[0].text)
 print("Price Range="+ elements[1].text)
 print("Avg="+ elements[2].text)
 print("Change="+ elements[3].text)
 print("Unit="+ elements[4].text)

Add To CSV

import csv
f = open('Path where you want to store the file', 'w')
writer = csv.writer(f)
for element in driver.find_elements_by_class_name("historyBodyRow___1Bk9u"):
  elements =element.find_elements_by_tag_name("div")
  entry = [elements[0].text ,elements[1].text ,elements[2].text , elements[3].text, elements[4].text]
  writer.writerow(entry)

f.close

Python web scraping/ data extraction

Answers (1)

Related Questions