Helena
Helena

Reputation: 87

R parsing multiple tables from a link page

I see there are many subject-related posts but I can't find a solution to this (surely my fault).

I am trying to scrape the tables issued on https://tradingeconomics.com and being a newbie in this I am facing issues.

I would like to get all the tables, plus the sub-tables by continent as per menu above each page.

Yet, I have tried to include all the links in an R vector and then to have a go on scraping by there but without any success:

 ## 00. Importing the main link
trading_ec <- read_html("https://tradingeconomics.com/indicators")


## 01. Scraping the variables names
tr_ec_tabs <- trading_ec %>%
  html_nodes(".list-group-item a") %>%
  html_text(trim=TRUE)


## 02. Editing the vector 
tr_ec_tabs_lo <- tolower(tr_ec_tabs)
tr_ec_nospace <- gsub(" ", "-", tr_ec_tabs_lo)


## 03. Creating a .json indicators vector
json.indicators <- paste0("https://tradingeconomics.com/country-list/", tr_ec_nospace)

## 04. Function
table <- list()
for(i in seq_along(json.indicators))
{
  total_list <- readHTMLTable(json.indicators[i])
  n.rows <- unlist(lapply(total_list, function(t) dim(t)[1]))
  table[[i]] <- as.data.frame(total[[which.max(n.rows)]])
}

Upvotes: 0

Views: 77

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 173803

If you replace your loop with

table <- list()
for(i in seq_along(json.indicators[-102]))
{
  table[[i]] <- html_table(read_html(json.indicators[i]))[[1]]
  cat("Page", i, "of", length(json.indicators[-102]), "obtained.\n")
}

You get a nice list of data frames. You have to drop index 102 because that links to a page without a table. Because it is a function that takes a while to run, I have added a cat statement that allows you to see how many pages you have scraped and how many are left.

Upvotes: 1

Related Questions