Reputation: 1
I am trying to take college football data from every D1 school and some school's have this funky table on their stats page that I can't use in R to extract the data I want. For most pages I have used readHTML but for a page like this:
http://mutigers.com/cumestats.aspx?path=football&
I have no idea how to get the data I want which is in the "Overall Individual Statistics" category. Any clue as to how to get this to work?
Upvotes: 0
Views: 50
Reputation: 78832
I'm pretty sure it has something to do with it filtering by user-agent. Give this a go. It uses rvest
and httr
(for the user_agent
function) to get you the data, and you can target by CSS selectors
library(xml2)
library(httr)
library(rvest)
UA <- "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.71 Safari/537.36"
pg <- html_session("http://mutigers.com/cumestats.aspx?path=football&", user_agent(UA))
# get all "id'd" <table>s
unique(as.character(na.omit(html_attr(html_nodes(pg, "table"), "id"))))
## [1] "individual_rushing_tbl" "individual_passing_tbl"
## [3] "individual_receiving_tbl" "individual_puntreturns_tbl"
## [5] "individual_interceptions_tbl" "individual_kickreturns_tbl"
## [7] "individual_scoring_tbl" "individual_punting_tbl"
## [9] "overall_defensive_tbl" "ind_offense_stats"
## [11] "ind_defense_stats" "ind_kick_stats"
html_table(html_nodes(pg, "table#individual_rushing_tbl"))[[1]]
## RUSHING GP Att Gain Loss Net Avg TD Long Avg/G
## 1 Witter, Ish 6 81 351 27 324 4.0 1 27 54.00
## 2 Mauk, Maty 4 36 179 34 145 4.0 1 24 36.25
## 3 Hansbrough, Russell 5 33 174 8 166 5.0 0 26 33.20
## 4 Hunt, Tyler 6 14 54 0 54 3.9 0 11 9.00
## 5 Abbington, Chase 5 6 40 1 39 6.5 0 12 7.80
## 6 Lock, Drew 6 14 33 66 -33 -2.4 0 11 -5.50
## 7 Steward, Morgan 3 10 21 3 18 1.8 0 6 6.00
## 8 Team 4 8 0 10 -10 -1.3 0 0 -2.50
## 9 TOTAL 6 202 852 149 703 3.5 2 27 117.17
## 10 Opponents 6 236 879 203 676 2.9 4 29 112.67
html_table(html_nodes(pg, "table#individual_passing_tbl"))[[1]]
## PASSING GP Efficiency Comp-Att-Int Pct Yards TD Long Avg/G
## 1 Mauk, Maty 4 112.49 57-110-4 51.82 % 654 6 51 163.50
## 2 Lock, Drew 6 107.51 52-92-3 56.52 % 512 3 78 85.33
## 3 TOTAL 6 110.22 109-202-7 53.96 % 1166 9 78 194.33
## 4 Opponents 6 109.84 102-170-7 60.00 % 979 5 35 163.17
Upvotes: 2