Reputation: 71
I am a statistician/data scientist, R user, runner, and a beginner in the realm of webscraping.
I recently completed a race in Tampa, FL and the results are posted online. I would like to use some web scraping methods in R to pull this data for some fun analytics.
My experience with webscraping in R is very limited. Thus far I have depended on using "SelectorGadet" in a Google Chrome browser to identify the tags of the data I am trying to fetch.
For this one I am not having any luck. Any guidance on getting started would be appreciated.
URL: https://results2.xacte.com/#/e/2534/placings
some of the code I have tried:
library(rvest)
library(dplyr)
link = "https://results2.xacte.com/#/e/2573/placings"
page = read_html(link)
test = page %>% html_nodes("#place_1st .md-ink-ripple") %>% html_text
test returns: character(0)
If anyone can guide me in a way to pull some data into an object or dataframe it would be greatly appreciated. I am hoping to pull name, time, pace, etc. from this page. Any code to get me started would be enough for me to start to work to pull more data and analyze.
Thank You
Upvotes: 1
Views: 91
Reputation: 24139
Not a good page to learn out to scrape. This page uses javascript to display the page so using basic rvest will not work. Look at the LiveHTML()
function.
Or if you use the browser tools and look at the network, you should see a file named "agegroup" that should contain the information you are looking for. If you copy that files loop, you can download the file, clean it up and convert from JSON to a data frame.
#download the file
download.file("https://results.xacte.com/json/agegroup?categoryId=9828&eventId=2534&limit=250&offset=0&subeventId=6322&callback=angular.callbacks._2", "runner.txt")
#read the databack in
data <- read_file("runner.txt")
#remove the javascript parts at the beginning ana end
data1<- sub("angular.callbacks._2\\(", "", data)
data1 <-sub("\\);", "", data1)
#Convert the Json to a data frame
df <- jsonlite::fromJSON(data2)$aaData
#number of records
jsonlite::fromJSON(data2)$iTotalRecords
In the above script I modified the limit from a default of 25 names to 250. It looks like there were over 3500 participants so you have room to increase that number if you want everyone. There are many columns with the various split times etc.
Note the times are in raw form, so you will have to convert from milliseconds to minutes.
Hope this helps.
Upvotes: 3