Reputation: 23
Maybe this subject is treated in other posts but I cannot find a solution to my issue. I am trying to scrape data from https://tradingeconomics.com/indicators website. I am trying to scrape data regarding indicators, in particular the country names and the plots included in any country link.
tradec = function(tradelink) {
trade_page = read_html(tradelink)
trade_element = trade_page %>% html_nodes(".primary_photo+ td a") %>%
html_text() %>% paste(collapse = ",")
return(trade_element)
}
main_page <- read_html("https://tradingeconomics.com/country-list/gdp-growth-rate")
country_list <- main_page %>%
html_nodes("#ctl00_ContentPlaceHolder1_ctl01_UpdatePanel1 a") %>%
html_text() %>%
trimws %>%
gsub(" ", "-", .)
tradec_df = data.frame()
for (i in country_list) {
link = paste0("https://tradingeconomics.com/", i , "/gdp-growth")
page = read_html(link)
country = page %>% html_nodes("#SelectCountries") %>% html_text()
tradec_charts = page %>% html_nodes("#ImageChart") %>% html_text
tradec_df = rbind(tradec_df, data.frame(country, tradec_charts, stringsAsFactors = FALSE))
print(paste("Page:", country_list))
}
In an ideal world, I would like to have a page printed for each country including country name and the plot. I am pretty sure that plots might be scraped in some way and displayed though I have no idea about how. Any suggestion?
Upvotes: 0
Views: 337
Reputation: 52468
It's not working because each element in the countries
variable contains illegal characters:
[1] "\r\n South Africa\r\n "
[2] "\r\n Peru\r\n "
[3] "\r\n Botswana\r\n "
So all you need to do is remove those characters with trimws()
, so they look like this instead:
country_list
[1] "South Africa" "Peru" "Botswana" "India" "Turkey"
[6] "New Zealand" "Argentina" "Malta" "Slovenia" "El Salvador"
[11] "Ireland" "Rwanda" "Albania" "Luxembourg" "Nigeria"
[16] "Canada" "Jamaica" "Uruguay" "Brazil" "Paraguay"
This works. The only line I changed was to add the pipe to trimws()
:
library(tidyverse)
library(rvest)
tradec = function(tradelink) {
trade_page = read_html(tradelink)
trade_element = trade_page %>% html_nodes(".primary_photo+ td a") %>%
html_text() %>% paste(collapse = ",")
return(trade_element)
}
main_page <- read_html("https://tradingeconomics.com/country-list/gdp-growth-rate")
country_list <- main_page %>%
html_nodes("#ctl00_ContentPlaceHolder1_ctl01_UpdatePanel1 a") %>%
html_text() %>%
trimws
tradec_df = data.frame()
for (i in country_list) {
link = paste0("https://tradingeconomics.com/", i , "/gdp-growth")
page = read_html(link)
country = page %>% html_nodes("#SelectCountries") %>% html_text()
tradec_links = page %>% html_nodes("#ImageChart") %>% html_text
}
Upvotes: 1