Wookeun Lee
Wookeun Lee

Reputation: 463

How can I handle error problem in iteration process in R?

I have a problem in coping with error in for loop.

In the code below, I want to scrape data tables and integrate as one dataframe.

During web scraping, some address links does not work, and web scraping stops and ends in the middle of the scraping process. (error location : doc = read_html(i, encoding = 'UTF-8')

How can I proceed next scraping process and complete iteration to the whole vector, ignoring errorneous link?

fdata = data.frame()
n = 1
for (i in data$address) {
  doc = read_html(i, encoding = 'UTF-8')
  dtable = doc %>% 
    html_table()
  fdata = bind_rows(fdata, dtable)
  len = length(data$address)
  print(n/len*100)
  n = n + 1
}

Upvotes: 0

Views: 47

Answers (2)

Jake Kaupp
Jake Kaupp

Reputation: 8072

You can also use possibly from purrr to return NA on errors, build a function to scrape your table then iterate and bind with map_dfr

library(purrr)
library(rvest)

read_possible <- posibly(read_html, NA)

scrape_table <- function(address) {

  doc <- read_possible(address, encoding = 'UTF-8')

  if (is.na(doc)) {
    NA
  } else  {
    html_table(doc)
  }

}

map_dfr(data$address, scrape_table)

Upvotes: 1

niko
niko

Reputation: 5281

Simply adding a try combined with if error next will do, e.g.

fdata = data.frame()
n = 1
for (i in data$address) {
  doc = try(read_html(i, encoding = 'UTF-8'), silent = TRUE)
  if (any(class(doc) == 'try-error')) next
  dtable = doc %>% 
    html_table()
  fdata = bind_rows(fdata, dtable)
  len = length(data$address)
  print(n/len*100)
  n = n + 1
}

Upvotes: 1

Related Questions