Reputation: 10491
We are struggling to grab the main table at this fangraphs link. Using rvest
:
url = 'https://www.fangraphs.com/leaders/splits-leaderboards?splitArr=1&splitArrPitch=&position=B&autoPt=false&splitTeams=false&statType=team&statgroup=2&startDate=2021-07-07&endDate=2021-07-21&players=&filter=&groupBy=season&sort=9,1'
table_nodes = url %>% read_html() %>% html_nodes('table')
table_nodes
table_nodes
{xml_nodeset (7)}
[1] <table class="menu-standings-table"><tbody><tr>\n<td>\r\n <div class="menu-sub-header">AL East</div>\r\n ...
[2] <table class="menu-team-table">\n<tr>\n<td>\r\n <div class="menu-sub-header">AL East</div>\r\n ...
[3] <table class="menu-team-table">\n<tr>\n<td>\r\n <div class="menu-sub-header">AL East</div>\r\n ...
[4] <table>\n<tr>\n<td><a href="http://www.fangraphs.com/blogs/top-45-prospects-baltimore-orioles">BAL</a></td>\n<td><a href="http://www.fangraphs.com/blogs/top-34-prospects ...
[5] <table>\n<tr>\n<td><a href="http://www.fangraphs.com/blogs/top-30-prospects-atlanta-braves">ATL</a></td>\n<td><a href="http://www.fangraphs.com/blogs/top-49-prospects-ch ...
[6] <table>\n<tr>\n<td><a href="http://www.fangraphs.com/blogs/top-40-prospects-baltimore-orioles">BAL</a></td>\n<td><a href="http://www.fangraphs.com/blogs/top-38-prospects ...
[7] <table>\n<tr>\n<td><a href="http://www.fangraphs.com/blogs/top-27-prospects-atlanta-braves">ATL</a></td>\n<td><a href="http://www.fangraphs.com/blogs/top-41-prospects-ch ...
None of these 7 tables are the main table at the URL with all of the different team stats. url %>% read_html() %>% html_nodes('div.table-scroll')
returns an empty nodeset, and div.table-scroll
is the wrapper div that the main table is located in.
Edit: I guess here is the network request, but still not sure how to get API call from this. How to see the full API call for this?
Upvotes: 1
Views: 162
Reputation: 84475
Data is dynamically retrieved from an API call. Switch to httr as you need to make a POST request which includes the start/end date. Also, switch to infinite in terms of returning as much data as possible, with as few calls as possible.
You want to convert the below into some form of custom function which accepts date args.
library(httr)
library(purrr)
headers = c(
'user-agent' = 'Mozilla/5.0',
'content-type' = 'application/json;charset=UTF-8'
)
data = '{"strPlayerId":"all","strSplitArr":[1],"strGroup":"season","strPosition":"B","strType":"2","strStartDate":"2021-07-07","strEndDate":"2021-07-21","strSplitTeams":false,"dctFilters":[],"strStatType":"team","strAutoPt":"false","arrPlayerId":[],"strSplitArrPitch":[]}'
r <- httr::POST(url = 'https://www.fangraphs.com/api/leaders/splits/splits-leaders', httr::add_headers(.headers=headers), body = data) %>% content()
df <- map_df(r$data, data.frame)
Upvotes: 3