Reputation: 35
I'm trying to create a pipeline for my SQL database to contain all of the players who have played in the NBA with their corresponding unique player ID's (as shown in the image below) using this webpage.
How the ID's Manifest Themselves
I was able to successfully do it in python (to create a CSV instead) while manually creating a list with a variable from the stats_ptsd.js file I found in network responses once I inspected the page. I'm not showing this python code because it is not scraping the page but instead referencing this manually copied list.
Network Responses
How the CSV Looks
Now I'm not sure how to scrape the information with R. I've tried a ton of different methods I've seen across the internet, many using the rvest
package, but to no avail. I haven't had any meaningful output or error message to show for now. Hopefully, someone has a suggestion for the best way to do this, whether by accessing the .js file or scraping the HTML elements. The xpath
for a player's 'a' HTML element with the valid href is shown below.
//*[contains(concat( " ", @class, " " ), concat( " ", "players-list__name", " " )) and (((count(preceding-sibling::*) + 1) = 91) and parent::*)]//a
Upvotes: 1
Views: 368
Reputation: 84465
The data is coming from a js file you can find in the network tab. You can regex or substring out the javascript dictionary within and parse with a json parser.
library(rvest)
library(stringr)
library(magrittr)
library(jsonlite)
r <- read_html('https://stats.nba.com/js/data/ptsd/stats_ptsd.js') %>%
html_node('body') %>%
html_text() %>%
toString()
data <- str_match_all(r,'stats_ptsd = (.*);')
data <- data.frame(jsonlite::fromJSON(data[[1]][,2])$data$players)
write.csv(data,file="players.csv")
You could also subset and re-order before writing out:
df <- setNames(data[,c("X2","X1")],c("Name","Id"))
write.csv(df,file="players.csv")
References:
Upvotes: 1