Esclass
Esclass

Reputation: 3

Python web scraping/ data extraction

For my master thesis, I am exploring the possibility to extract data from a website via web automation. The steps are as follows:

  1. Sign in to the website ( https://www.metal.com/Copper/201102250376 )
  2. Input username and password
  3. Click sign-in
  4. Change date to 01/01/2020
  5. Scrape the table data generated and then save it to csv file
  6. Save to a specific folder with a specific name on my PC
  7. Run the same sequence to download additional historical price data for other materials in a new tab in the same browser window

I am stuck in steps 5, 6 and 7

from selenium import webdriver

DRIVER_PATH = 'C:\webdriver\chromedriver.exe' driver = webdriver.Chrome(executable_path=DRIVER_PATH, chrome_options=ChromeOptions)

driver.maximize_window()

driver.get('https://www.metal.com/Copper/201102250376')

#Login steps LoginClick1 = driver.find_element_by_css_selector( '#__next > div > div.smm-component-header-en > div.main > div.right > button.button.sign-in')

LoginClick1.click()

user_input = driver.find_element_by_id('user_name') user_input.send_keys('#####')

password_input = driver.find_element_by_id('password') password_input.send_keys('####')

Submit = driver.find_element_by_css_selector( 'body > div:nth-child(17) > div > div.ant-modal-wrap.ant-modal-centered.smm-component-sign-en > div > div.ant-modal-content > div > div > div > div.smm-component-sign-en-content > form > div:nth-child(3) > div > div > span > button')

Submit.click()

time.sleep(2)

#scroll down the point of interest in page driver.execute_script("window.scrollBy(0,1000)", "")

#change currency driver.find_element(By.XPATH,"//img[contains(@class,'icon___BUqam')]").click()

time.sleep(1)

#change date from datepicker

date_input = driver.find_element_by_xpath( '//*[@id="__next"]/div/div[5]/div1/div[7]/div1/div2/div1/span1/div/i')

date_input.click()

action = ActionChains(driver)

action.move_to_element(date_input).send_keys(Keys.BACKSPACE).send_keys( Keys.BACKSPACE).send_keys(Keys.BACKSPACE).send_keys(Keys.BACKSPACE).send_keys(Keys.BACKSPACE).send_keys(Keys.BACKSPACE).send_keys(Keys.BACKSPACE).send_keys(Keys.BACKSPACE).send_keys(Keys.BACKSPACE).send_keys(Keys.BACKSPACE).perform()

action.move_to_element(date_input).send_keys("01/01/2020").perform() action.move_to_element(date_input).send_keys(Keys.ENTER).perform()

time.sleep(2)

I am stuck trying to scrape the data from the table generated and then save into a csv file using selenium. See HTML code below table generated

**May 27, 2022** **10,758.75-10,788.43** **10,773.59** **+97.94** **USD/mt**

Any help would be massively appreciated.

Download file using button press Download button

driver.find_element(By.XPATH,"//img[contains(@src,'https://static.metal.com/www.metal.com/4.1.161/static/images/price/download.png')]").click()

time.sleep(1)

driver.find_element(By.XPATH,"//img[contains(@src,'https://static.metal.com/www.metal.com/4.1.161/static/images/price/download_excel.png')]").click()

To save time since I have multiple files/data to download, I am also exploring the possibility of directly saving the file via the download button provided.

Have you any idea on how to go about this?

Upvotes: 0

Views: 558

Answers (1)

Darshan Shah
Darshan Shah

Reputation: 346

The reason sign in button is not getting clicked is because the xpath //*[@id="__next"]/div/div[3]/div[2]/div[2]/button[2] is incorrect the id of next is the main container div through which we are naviagting to the sign button by providing remaining html nodre structure

Instead you can directly select the sign in button as //button[@class='button sign-in'] based on its class value Refer Image attached

Your solution for sign in would look like

driver = webdriver.Chrome(executable_path='C:\webdrivers\chromedriver.exe')
driver.maximize_window()
driver.get('https://www.metal.com/Nickel/201102250239')
# Click on Sign In
driver.find_element(By.XPATH, "//button[@class='button sign-in']").click()
# Enter username
driver.find_element(By.ID, "user_name").send_keys("your username")
# Enter password
driver.find_element(By.ID, "password").send_keys("your password") 
# Click Sign In
driver.find_element(By.XPATH, "//button[@type='submit']").click()

To scrape data

for element in driver.find_elements_by_class_name("historyBodyRow___1Bk9u"):
 elements =element.find_elements_by_tag_name("div")
 print("Date="+ elements[0].text)
 print("Price Range="+ elements[1].text)
 print("Avg="+ elements[2].text)
 print("Change="+ elements[3].text)
 print("Unit="+ elements[4].text)

Add To CSV

import csv
f = open('Path where you want to store the file', 'w')
writer = csv.writer(f)
for element in driver.find_elements_by_class_name("historyBodyRow___1Bk9u"):
  elements =element.find_elements_by_tag_name("div")
  entry = [elements[0].text ,elements[1].text ,elements[2].text , elements[3].text, elements[4].text]
  writer.writerow(entry)

f.close

Upvotes: 1

Related Questions