R web-scraping on a multiple-level website with non dynamic URLs

Question

I apologize in case I have not found a previous topic on this matter. I want to scrape this website http://www.fao.org/countryprofiles/en/ In particular, this page includes a lot of links to country infos. Those links'structure is:

http://www.fao.org/countryprofiles/index/en/?iso3=KAZ

http://www.fao.org/countryprofiles/index/en/?iso3=AFG

and any of this page includes a News section I am interested in. Of course, I could scrape page-by-page but that would be a waste of time.

I tried the following but that is not working:

countries <- read_html("http://www.fao.org/countryprofiles/en/") %>%
  html_nodes(".linkcountry") %>%
  html_text()

country_news <- list()
sub <- html_session("http://www.fao.org/countryprofiles/en/")

for(i in countries[1:100]){
  page <- sub %>% 
    follow_link(i)  %>% 
    read_html()
  country_news[[i]] <- page %>%
    html_nodes(".white-box") %>%
    html_text()
}

Any idea?

R web-scraping on a multiple-level website with non dynamic URLs

Answers (1)

Related Questions