Canovice
Canovice

Reputation: 10441

Scraping table within iframe using R rvest library

I am decent with R's rvest library for scraping websites, but am struggling with something new. From this webpage - http://www.naia.org/ViewArticle.dbml?ATCLID=205323044 - I am trying to scrape the main table of colleges.

Here is what my code looks like currently:

NAIA_url = "http://www.naia.org/ViewArticle.dbml?ATCLID=205323044"
NAIA_page = read_html(NAIA_url)

tables = html_table(html_nodes(NAIA_page, 'table'))
# tables returns a length-2 list, however neither of these tables are the table I desire.

# grab the correct iframe node
iframe = html_nodes(NAIA_page, "iframe")[3] 

However I'm struggling past this. (1) for some reason calling html_nodes isn't grabbing the table I want. (2) and I'm not sure if I should instead grab the iframe and then try to grab the table from within it.

Any help appreciated!

Upvotes: 2

Views: 2783

Answers (1)

yeedle
yeedle

Reputation: 5008

If the embedded iframe is html, you can grab the iframe source and get your desired table from there.


library(rvest)
#> Loading required package: xml2
library(magrittr)
"http://www.naia.org/ViewArticle.dbml?ATCLID=205323044" %>%
  read_html() %>%
  html_nodes("iframe") %>%
  extract(3) %>% 
  html_attr("src") %>% 
  read_html() %>% 
  html_node("#searchResultsTable") %>% 
  html_table() %>%
  head()
#>                                   College or University       City, State
#> 1                   Central Christian College ATHLETICS     McPherson, KS
#> 2 +                   Crowley's Ridge College ATHLETICS     Paragould, AR
#> 3                       Edward Waters College ATHLETICS  Jacksonville, Fl
#> 4                 Fisher College ADMISSIONS | ATHLETICS        Boston, MA
#> 5       Georgia Gwinnett College ADMISSIONS | ATHLETICS Lawrenceville, GA
#> 6   Lincoln Christian University ADMISSIONS | ATHLETICS       Lincoln, IL
#>   Conference Enrollment
#> 1     A.I.I.        259
#> 2     A.I.I.          0
#> 3     A.I.I.        805
#> 4     A.I.I.        600
#> 5     A.I.I.      9,720
#> 6     A.I.I.      1,060

Upvotes: 6

Related Questions