Reputation: 363
I want to store in a data frame all the teams for the NHL $30K Finnish Flash on the 2019-01-10. I am able to store the team on the first page only so far. Moreover, if a user entered two different teams his highest ranking team is stored both times... Here is my code:
#Packages:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
import pandas as pd
import time
# Driver
chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome(chromedriver)
# DF taht will be use later
results = pd.DataFrame()
calendar=[]
calendar.append("2019-01-10")
for d in calendar:
driver.get("https://rotogrinders.com/resultsdb/date/"+d+"/sport/4/")
time.sleep(10)
contest= driver.find_element_by_xpath("//*[@id='root']/div/main/main/div[2]/div[3]/div/div/div[1]/div/div/div/div/div[3]")
contest.click()
list_links = driver.find_elements_by_tag_name('a')
hlink=[]
for ii in list_links:
hlink.append(ii.get_attribute("href"))
sub="https://rotogrinders.com/resultsdb"
con= "contest"
contest_list=[]
for text in hlink:
if sub in text:
if con in text:
contest_list.append(text)
c=contest_list[2]
driver.get(c)
WebDriverWait(driver, 60).until(ec.presence_of_element_located((By.XPATH, './/tbody//tr//td//span//a[text() != ""]')))
# Get tables to get the user names
tables = pd.read_html(driver.page_source)
users_df = tables[0][['Rank','User']]
users_df['User'] = users_df['User'].str.replace(' Member', '')
# Initialize results dataframe and iterate through users
for i, row in users_df.iterrows():
rank = row['Rank']
user = row['User']
# Find the user name and click on the name
user_link = driver.find_elements(By.XPATH, "//a[text()='%s']" %(user))[0]
user_link.click()
# Get the lineup table after clicking on the user name
tables = pd.read_html(driver.page_source)
lineup = tables[1]
# Restructure to put into resutls dataframe
lineup.loc[9, 'Name'] = lineup.iloc[9]['Salary']
lineup.loc[10, 'Name'] = lineup.iloc[9]['Pts']
temp_df = pd.DataFrame(lineup['Name'].values.reshape(-1, 11),
columns=lineup['Pos'].iloc[:9].tolist() + ['Total_$', 'Total_Pts'] )
temp_df.insert(loc=0, column = 'User', value = user)
temp_df.insert(loc=0, column = 'Rank', value = rank)
temp_df["Date"]=d
results = results.append(temp_df)
results = results.reset_index(drop=True)
driver.close()
So, I would like :
1) To iterate through all pages :
I did locate the next_page
button; with :
next_button = driver.find_elements_by_xpath("//button[@type='button']")
But, I am not able to add that step in my for loop.
2)To access the differents user_link if a user entered more than once the contest. I think that maybe I could do it with a for loop using the frequency of a user like that:
users_df.groupby("User").count()
for i in range(users_df[user,"Number"]):
user_link = driver.find_elements(By.XPATH, "//a[text()='%s']" %(user))[i]
user_link.click()
But, I always get some errors message when adding those steps. Or if it is working, it simply skip the part to store row by row all the teams and quickly close the driver...
Upvotes: 0
Views: 410
Reputation: 109
My suggestions:
For you it will be enough if you will use just requests or any other equivalent module to get the data from server because the service you want to scrape has api server for example check the link. The example is using first end-point:
Hope this will makes your task easier.
Upvotes: 1