Reputation: 25
So i was working on getting some file from https://www.pbpstats.com/totals/nba/player. Im using selenium with chrome webdriver. I cant quite figure out how to click on the "get stats". I can do this manually but would like to do it through the html and selenium.
Have tried this:
browser = webdriver.Chrome()
browser.get('https://www.pbpstats.com/totals/nba/player')
element = browser.find_elements_by_tag_name('button')
element.click()
But nothing happens. Im not sure how understand the output from find_elements_by_tag_name. Gets something like "this selenium.webdriver.remote.webelement.WebElement (session="14bacd9bab4b484952ba872ea0373663", element="4ef4e9da-b193-46a8-8209-265b8bef3f05"" (it differs in after the equal signs)
Upvotes: 2
Views: 222
Reputation: 28575
Selenium is a bit over kill here since the data is returned form the api. Just grab the data from there. You'll also get all the data and not have to go through each dropdown for Scoring, Assists, Rebounds, etc. (all 248 columns)
If you want the per game and/or per 100 possession, then it's just a matter of dividing the numerical int columns by the 'GP'
or 'Possessions' * 100
columns once you have the dataframe.
import requests
import pandas as pd
url = 'https://api.pbpstats.com/get-totals/nba'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36'}
payload = {
'Season': '2020-21',
'SeasonType': 'Regular+Season',
'Type': 'Player'}
jsonData = requests.get(url, headers=headers, params=payload).json()
df = pd.DataFrame(jsonData['multi_row_table_data'])
df.to_csv('pbpstats_export.csv', index=False)
The data will come with the column in alpha order, so if you want to move them before writing to file, you can put which columns you want to come first. I just choose the name and team since those typically are the first 2 columns in a sports table:
# If you want to reorder the first few columns. Otherwise columns are i alpha order
reorder = ['Name','TeamAbbreviation']
for col in reversed(reorder):
col = df.pop(col)
df.insert(0, col.name, col)
Ouput:
print(df)
Name TeamAbbreviation 2pt And 1 Free Throw Trips \
0 Julius Randle NYK 35.0
1 Nikola Jokic DEN 24.0
2 Buddy Hield SAC 3.0
3 Domantas Sabonis IND 32.0
4 RJ Barrett NYK 21.0
.. ... ... ...
495 Jontay Porter MEM NaN
496 Ty-Shon Alexander PHX NaN
497 Rayjon Tucker PHI NaN
498 Brian Bowen II IND 1.0
499 Jared Harper NYK NaN
Arc3Accuracy Arc3Assists ... BlockedCorner3 Period3Fouls5Minutes \
0 0.408284 90.0 ... NaN NaN
1 0.422360 87.0 ... NaN NaN
2 0.365796 22.0 ... NaN NaN
3 0.281818 101.0 ... NaN NaN
4 0.318681 32.0 ... NaN NaN
.. ... ... ... ... ...
495 0.500000 NaN ... NaN NaN
496 NaN NaN ... NaN NaN
497 1.000000 NaN ... NaN NaN
498 NaN NaN ... NaN NaN
499 NaN NaN ... NaN NaN
HeaveMakes Period1Fouls3Minutes Period2Fouls4Minutes
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
.. ... ... ...
495 NaN NaN NaN
496 NaN NaN NaN
497 NaN NaN NaN
498 NaN NaN NaN
499 NaN NaN NaN
[500 rows x 248 columns]
Upvotes: 2
Reputation: 3541
Try that out, should help:
import time
browser = webdriver.Chrome()
browser.get('https://www.pbpstats.com/totals/nba/player')
time.sleep(4)
element = browser.find_element_by_xpath("//button[text()='Get Stats']")
element.click()
or you can use explicitly wait, like that:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
timeout = 30
browser = webdriver.Chrome()
browser.get('https://www.pbpstats.com/totals/nba/player')
myElem = WebDriverWait(browser, timeout).until(EC.element_to_be_clickable((By.XPATH, "//button[text()='Get Stats']")))
myElem.click()
Upvotes: 2
Reputation: 663
you can perform a click like that :
browser = webdriver.Chrome()
browser.get('https://www.pbpstats.com/totals/nba/player')
time.sleep(5)
element = browser.find_elements_by_xpath('//*
[@id="totals"]/main/div[3]/div/button[1]')[0]
element.click()
Upvotes: 1