Webscraping with selenium

Question

I want to store in a data frame all the teams for the NHL $30K Finnish Flash on the 2019-01-10. I am able to store the team on the first page only so far. Moreover, if a user entered two different teams his highest ranking team is stored both times... Here is my code:

#Packages:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
import pandas as pd
import time

# Driver
chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome(chromedriver)

# DF taht will be use later 
results = pd.DataFrame()




calendar=[]
calendar.append("2019-01-10")


for d in calendar:
    driver.get("https://rotogrinders.com/resultsdb/date/"+d+"/sport/4/")

    time.sleep(10)
    contest= driver.find_element_by_xpath("//*[@id='root']/div/main/main/div[2]/div[3]/div/div/div[1]/div/div/div/div/div[3]")



    contest.click()
    list_links = driver.find_elements_by_tag_name('a')
    hlink=[]
    for ii in list_links:
        hlink.append(ii.get_attribute("href"))
    sub="https://rotogrinders.com/resultsdb"
    con= "contest"
    contest_list=[]
    for text in hlink:
        if sub in text:
            if con in text:
                contest_list.append(text)

    c=contest_list[2]
    driver.get(c)


    WebDriverWait(driver, 60).until(ec.presence_of_element_located((By.XPATH, './/tbody//tr//td//span//a[text() != ""]')))


# Get tables to get the user names
    tables = pd.read_html(driver.page_source)
    users_df  = tables[0][['Rank','User']]
    users_df['User'] = users_df['User'].str.replace(' Member', '')

# Initialize results dataframe and iterate through users

    for i, row in users_df.iterrows():

        rank = row['Rank']
        user = row['User']

    # Find the user name and click on the name
        user_link = driver.find_elements(By.XPATH, "//a[text()='%s']" %(user))[0]
        user_link.click()

    # Get the lineup table after clicking on the user name
        tables = pd.read_html(driver.page_source)
        lineup = tables[1]

    # Restructure to put into resutls dataframe
        lineup.loc[9, 'Name'] = lineup.iloc[9]['Salary']
        lineup.loc[10, 'Name'] = lineup.iloc[9]['Pts']

        temp_df = pd.DataFrame(lineup['Name'].values.reshape(-1, 11), 
        columns=lineup['Pos'].iloc[:9].tolist() + ['Total_$', 'Total_Pts'] )

        temp_df.insert(loc=0, column = 'User', value = user)
        temp_df.insert(loc=0, column = 'Rank', value = rank)
        temp_df["Date"]=d
        results = results.append(temp_df)        

    results = results.reset_index(drop=True)

driver.close()

So, I would like :

1) To iterate through all pages :

I did locate the next_page button; with :

next_button = driver.find_elements_by_xpath("//button[@type='button']")

But, I am not able to add that step in my for loop.

2)To access the differents user_link if a user entered more than once the contest. I think that maybe I could do it with a for loop using the frequency of a user like that:

users_df.groupby("User").count()

 for i in range(users_df[user,"Number"]):

     user_link = driver.find_elements(By.XPATH, "//a[text()='%s']" %(user))[i]
     user_link.click()

But, I always get some errors message when adding those steps. Or if it is working, it simply skip the part to store row by row all the teams and quickly close the driver...

Webscraping with selenium

Answers (1)

Related Questions