Reputation: 11
I'm new to R and web scraping. I'm trying to read a table from the World Bank website into R.
Here is the url link for one of the projects as an example (my goal is to read the left table under "Basic Information"): http://projects.worldbank.org/P156880/?lang=en&tab=details
I'm using Chrome's Dedvtools to identify the selector nodes that i need for that particular table.
Here is my code:
library(rvest)
url <- "http://projects.worldbank.org/P156880/?lang=en&tab=details"
details <- url %>%
read_html() %>%
html_nodes(css = '#projectDetails > div:nth-child(2) > div.column-left > table') %>%
html_table()
Unfortunately, I get an empty list:
> details
list()
Any help on how to resolve this would be greatly appreciated.
Upvotes: 0
Views: 1102
Reputation: 2225
This site uses XML http requests which you can get using httr
. Open Chrome developer tools and go to the Network tab and then load your url above. You will notice four other urls are requested when loading the page, so click on projectdetails?
and you should see the html table in the Preview tab. Next, right click on projectdetails?
and Copy as cURL to a text editor and paste the URL, Referer, and X-Requested-With into the httr GET function below.
library(httr)
library(rvest)
res <- GET(
url = "http://projects.worldbank.org/p2e/projectdetails?projId=P156880&lang=en",
add_headers(Referer = "http://projects.worldbank.org/P156880/?lang=en&tab=details",
`X-Requested-With` = "XMLHttpRequest")
)
content(res) %>% html_node("table") %>% html_table( header=TRUE)
Project ID P156880
1 Status Active
2 Approval Date December 14, 2017
3 Closing Date December 15, 2023
4 Country Colombia
5 Region Latin America and Caribbean
6 Environmental Category B
Or write a function to get any project ID
get_project <-function(id){
res <- GET(
url = "http://projects.worldbank.org",
path = paste0("p2e/projectdetails?projId=", id, "&lang=en"),
add_headers(
Referer = paste0("http://projects.worldbank.org/", id, "/?lang=en&tab=details"),
`X-Requested-With` = "XMLHttpRequest")
)
content(res) %>% html_node("table") %>% html_table(header=TRUE)
}
get_project("P156880")
Upvotes: 1