AaronT
AaronT

Reputation: 1

Reading NCAA football statistics from a table that is not actually on the page. Trying to send them to R

I am trying to take college football data from every D1 school and some school's have this funky table on their stats page that I can't use in R to extract the data I want. For most pages I have used readHTML but for a page like this:

http://mutigers.com/cumestats.aspx?path=football&

I have no idea how to get the data I want which is in the "Overall Individual Statistics" category. Any clue as to how to get this to work?

Upvotes: 0

Views: 50

Answers (1)

hrbrmstr
hrbrmstr

Reputation: 78832

I'm pretty sure it has something to do with it filtering by user-agent. Give this a go. It uses rvest and httr (for the user_agent function) to get you the data, and you can target by CSS selectors

library(xml2)
library(httr)
library(rvest)

UA <- "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.71 Safari/537.36"

pg <- html_session("http://mutigers.com/cumestats.aspx?path=football&", user_agent(UA))

# get all "id'd" <table>s
unique(as.character(na.omit(html_attr(html_nodes(pg, "table"), "id"))))
##  [1] "individual_rushing_tbl"       "individual_passing_tbl"      
##  [3] "individual_receiving_tbl"     "individual_puntreturns_tbl"  
##  [5] "individual_interceptions_tbl" "individual_kickreturns_tbl"  
##  [7] "individual_scoring_tbl"       "individual_punting_tbl"      
##  [9] "overall_defensive_tbl"        "ind_offense_stats"           
## [11] "ind_defense_stats"            "ind_kick_stats"

html_table(html_nodes(pg, "table#individual_rushing_tbl"))[[1]]
##                RUSHING GP Att Gain Loss Net  Avg TD Long  Avg/G
## 1          Witter, Ish  6  81  351   27 324  4.0  1   27  54.00
## 2           Mauk, Maty  4  36  179   34 145  4.0  1   24  36.25
## 3  Hansbrough, Russell  5  33  174    8 166  5.0  0   26  33.20
## 4          Hunt, Tyler  6  14   54    0  54  3.9  0   11   9.00
## 5     Abbington, Chase  5   6   40    1  39  6.5  0   12   7.80
## 6           Lock, Drew  6  14   33   66 -33 -2.4  0   11  -5.50
## 7      Steward, Morgan  3  10   21    3  18  1.8  0    6   6.00
## 8                 Team  4   8    0   10 -10 -1.3  0    0  -2.50
## 9                TOTAL  6 202  852  149 703  3.5  2   27 117.17
## 10           Opponents  6 236  879  203 676  2.9  4   29 112.67

html_table(html_nodes(pg, "table#individual_passing_tbl"))[[1]]
##      PASSING GP Efficiency Comp-Att-Int     Pct Yards TD Long  Avg/G
## 1 Mauk, Maty  4     112.49     57-110-4 51.82 %   654  6   51 163.50
## 2 Lock, Drew  6     107.51      52-92-3 56.52 %   512  3   78  85.33
## 3      TOTAL  6     110.22    109-202-7 53.96 %  1166  9   78 194.33
## 4  Opponents  6     109.84    102-170-7 60.00 %   979  5   35 163.17

Upvotes: 2

Related Questions