Reputation: 3
I'm trying to web scrape the data table on "https://stats.premierlacrosseleague.com/pll-team-table". I've tried multiple different ways of achieving this and keep ending up with the same result that my table is empty? Does anyone have any solutions? I posted my code down below, thanks in advance!
library(rvest)
pll <- read_html("https://stats.premierlacrosseleague.com/pll-team-table")
table<- pll%>%html_nodes(".jss820")%>%html_text()
data_table<- data.frame(table)
Upvotes: 0
Views: 62
Reputation: 4658
Unfortunately, scraping in that way will not work, because the data is loaded dynamically; after the page has loaded. If you right-click the page, click 'inspect element', go to the 'network' tab, and refresh the page, you can see the XHR requests being made.
One of those requests is to https://api.stats.premierlacrosseleague.com/v1.00/teams-stats/all/2020, which contains the table you want, in JSON form. The below code reads that table with jsonlite (which gives a nested list
in R) and turns it into a data.frame
using unnest_wider:
library(tidyverse)
library(jsonlite)
url <- "https://api.stats.premierlacrosseleague.com/v1.00/teams-stats/all/2020"
data_list <- jsonlite::read_json(url)
data_table <- tibble(data = data_list) %>%
unnest_wider(data)
This gives
# A tibble: 7 x 55
scores faceoffPct shotPct twoPointShotPct twoPointShotsOn… clearPct ridesPct savePct shortHandedPct
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 20 0.488 0.339 0.5 3.83 0.9 0 0.644 0
2 21 0.490 0.230 0.6 1.93 0.961 0.12 0.588 0
3 16 0.452 0.238 0.5 1.75 0.98 0.0769 0.623 0
4 25 0.667 0.293 0.545 2.73 0.932 0.0196 0.591 0
5 28 0.333 0.184 0.6 1.52 0.940 0.0263 0.559 0
6 17 0.523 0.239 0.8 4.2 0.935 0.0755 0.545 0
7 13 0.696 0.351 0.571 2.43 1 0.0870 0.682 0
# … with 46 more variables
Upvotes: 1