Reputation: 13
As an example to teach myself rvest, I attempted to scrape a website to grab data that's already written in a table format. The only problem is that I can't get an output of the underlying table data.
The only thing I really need is the player column.
library(tidyverse)
library(rvest)
base <- "https://www.milb.com/stats/"
base2 <- "?page="
base3 <- "&playerPool=ALL"
html <- read_html(paste0(base,"pacific-coast/","2017",base2,"2",base3))
html2 <- html %>% html_element("#stats-app-root")
html3 <- html2 %>% html_text("#stats-body-table player")
https://www.milb.com/stats/pacific-coast/2017?page=2&playerPool=ALL (easy way to see actual example url)
"HTML 2" appears to work, but I'm a little stuck about what to do from there. A couple of different attempts just hit a wall.
once this works, I'll replace text with numbers and do a few for loops (which seems pretty simple).
Upvotes: 0
Views: 87
Reputation: 6669
If you "inspect" the page in chrome, you see it's making a call to download a json file. Just do that yourself...
library(jsonlite)
data <- fromJSON("https://bdfed.stitch.mlbinfra.com/bdfed/stats/player?stitch_env=prod&season=2017&sportId=11&stats=season&group=hitting&gameType=R&offset=25&sortStat=onBasePlusSlugging&order=desc&playerPool=ALL&leagueIds=112")
df <- data$stats
head(df)
year playerId playerName type rank playerFullName
1 2017 643256 Adam Cimber player 26 Adam Cimber
2 2017 458547 Vladimir Frias player 27 Vladimir Frias
3 2017 643265 Garrett Cooper player 28 Garrett Cooper
4 2017 542979 Keon Broxton player 29 Keon Broxton
5 2017 600301 Taylor Motter player 30 Taylor Motter
6 2017 624414 Christian Arroyo player 31 Christian Arroyo
...
Upvotes: 1