Reputation: 253
I want to scrape the statistics from this page:
url <- "http://www.pgatour.com/players/player.20098.stuart-appleby.html/statistics"
Specifically, I want to grab the data in the table that's underneath Stuart's headshot. It's headlined by "Stuart Appleby - 2015 STATS PGA TOUR"
I attempt to use rvest
, in combo with the Selector Gadget (http://selectorgadget.com/).
url_html <- url %>% html()
url_html %>%
html_nodes(xpath = '//*[(@id = "playerStats")]//td')
'Should' get me the table without, for example, the row on top that says "Recap -- Rank -- Additional Stats"
url_html <- url %>% html()
url_html %>%
html_nodes(xpath = '//*[(@id = "playerStats")] | //th//*[(@id = "playerStats")]//td')
'Should' get me the table with that "Recap -- Rank -- Add'l Stats" line.
Neither do.
Obvs I'm a complete newb when it comes to web scraping. When I click on 'view source' for that webpage, the data contained in the table isn't there.
In the source code, where I think the table should be starting, is this bit of code:
<script id="playerStatsTourTemplate" type="text/x-jquery-tmpl">
{{each(t, tour) tours}}
{{if pgatour.players.shouldProcessTour(tour.tourCodeLC)}}
<div class="statistics-head">
<h2 class="title">Stuart Appleby - <b>${year} STATS
.
.
.
So, it appears the table is stored somewhere (Json? Jquery? Javascript? Are those terms applicable here?) that isn't accessible to the html()
function. Is there anyway to use rvest
to grab this data? Is there an rvest
equivalent for grabbing data that is stored in this manner?
Thanks.
Upvotes: 1
Views: 3938
Reputation: 898
Check this out.
Open source project on GitHub scraping PGA data: https://github.com/zachwill/golf/blob/master/pga.py
Upvotes: 1
Reputation: 6659
I'd probably use the GET request that the page is making to get the raw data from their API and work on parsing that...
content(a)
gives you a list representation... basically the output from fromJSON()
or
as(a, "character")
gives you the raw JSON
library("httr")
a <- GET("http://www.pgatour.com/data/players/20098/2014stat.json")
content(a)
as(a, "character")
Upvotes: 2