Reputation: 21
I'm trying to scrape the following website for MLB draft data:
https://www.baseballamerica.com/draft-history/mlb-draft-database/#/
The issue is that I can't seem to find the correct class to input into rvest::html_nodes() in order to isolate the table. Using Chrome's "Inspect" tool, I've tried each of the classes that would seemingly identify the table:
library(tidyverse)
library(rvest)
url <- "https://www.baseballamerica.com/draft-history/mlb-draft-database/#/"
url %>%
read_html() %>%
html_nodes("table-container")
I've also tried "search-table draft-search-table", but I keep getting the same results: "{xml_nodeset (0)}". Any help would be greatly, greatly appreciated!
Upvotes: 2
Views: 128
Reputation: 84465
Content is loaded dynamically from an API call returning json. You can use httr POST request to the API for the info
library(httr)
headers = c('Content-Type'='application/json')
data='{"SigningBonusMin":"0","SigningBonusMax":"0","Year":"2019","Round":"1","TeamId":"0","FourYearSchoolType":"false","JuniorCollegeType":"false","HighSchoolType":"false","OtherSchoolType":"false","OverallNumber":"0","pageId":"1","paid":"false"}'
r <- content(httr::POST(url = 'https://www.baseballamerica.com/umbraco/api/draftdatabaseapi/advancedsearch', httr::add_headers(.headers=headers), body = data, encode = "json"))$Results
print(r)
Upvotes: 2