Jagr
Jagr

Reputation: 363

Python web scraping using Selenium - iterate through href link

I am trying to write a script that uses selenium to download many files which consist of different NHL players information; game-log. I want to download a file for each players in the following table: https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single

Once on that website, I wanted to click on all the players' name in the table. When a player's name is clicked through the href link, a new window opens. There are few drop-down menus at the top. I want to select "Rate" instead of "Counts" and also select " Game Log" instead of "Player Summary", and then click "Submit". Finally, I want to click on CSV(All) at the bottom to download a CSV file.

Here is my current code:

from selenium import webdriver
import csv
from selenium.webdriver.support.ui import Select
from datetime import date, timedelta
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC

 chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
 driver = webdriver.Chrome(chromedriver)

driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")
table = driver.find_element_by_xpath("//table[@class='indreg dataTable no-footer DTFC_Cloned']")
for row in table.find_elements_by_xpath("//tr[@role='row']")
    links = driver.find_element_by_xpath('//a[@href]')
    links.click()
    select = Select(driver.find_element_by_name('rate'))
    select.select_by_value("y")
    select1 = Select(driver.find_element_by_name('v'))
    select1.select_by_value("g")
    select2 = Select(driver.find_element_by_type('submit'))
    select2.select_by_value("submit")
    WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//div[@class="dt-button button-csv button-htm15"]')))
    CSVall = driver.find_element_by_xpath('//div[@class="dt-button button-csv button-htm15"]')
    CSVall.click()
driver.close()

I try to change different things, but I always get an error. Where is the problem ?

Moreover, I think I should probably add a line to wait for the website to load because it takes a few seconds; after "driver.get". I do not know what should be the expected conditions to end the wait in this case.

Thanks

Upvotes: 1

Views: 1077

Answers (2)

QHarr
QHarr

Reputation: 84465

Rather than keep clicking through selections you could grab the playerIds from the first page and concantenate those, along with the strings representing the selections for Rate and Game Log into the queryString part of the new URL. Sure you can tidy up the following.

from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC

def getPlayerId(url):
    id = url.split('playerid=')[1] 
    id = id.split('&')[0]
    return id

def makeNewURL(playerId):
    return 'https://www.naturalstattrick.com/playerreport.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&stdoi=oi&rate=y&v=g&playerid=' + playerId

#chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome()

driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")

links = driver.find_elements_by_css_selector('table.indreg.dataTable.no-footer.DTFC_Cloned [href*=playerid]')
newLinks = []

for link in links:
    newLinks.append(link.get_attribute('href'))

for link in newLinks:
    playerId = getPlayerId(link)
    link = makeNewURL(playerId)
    driver.get(link)
    WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//a[@class="dt-button buttons-csv buttons-html5"][2]')))
    CSVall = driver.find_element_by_xpath('//a[@class="dt-button buttons-csv buttons-html5"][2]')
    CSVall.click()

Upvotes: 1

ewwink
ewwink

Reputation: 19164

you don't need to click each player link but save the URLs as list, also there are several error, you can see working code below

from selenium import webdriver
import csv
from selenium.webdriver.support.ui import Select
from datetime import date, timedelta
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC

chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome(chromedriver)

driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")

playerLinks = driver.find_elements_by_xpath("//table[@class='indreg dataTable no-footer DTFC_Cloned']//a")
playerLinks = [p.get_attribute('href') for p in playerLinks]

print(len(playerLinks))

for url in playerLinks:
    driver.get(url)
    select = Select(driver.find_element_by_name('rate'))
    select.select_by_value("y")
    select1 = Select(driver.find_element_by_name('v'))
    select1.select_by_value("g")
    driver.find_element_by_css_selector('input[type="submit"]').click()
    WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//a[@class="dt-button buttons-csv buttons-html5"][2]')))
    CSVall = driver.find_element_by_xpath('//a[@class="dt-button buttons-csv buttons-html5"][2]')
    CSVall.click()

driver.close()

Upvotes: 1

Related Questions