Reputation: 463
I have a problem in coping with error in for loop.
In the code below, I want to scrape data tables and integrate as one dataframe.
During web scraping, some address links does not work, and web scraping stops and ends in the middle of the scraping process. (error location : doc = read_html(i, encoding = 'UTF-8')
How can I proceed next scraping process and complete iteration to the whole vector, ignoring errorneous link?
fdata = data.frame()
n = 1
for (i in data$address) {
doc = read_html(i, encoding = 'UTF-8')
dtable = doc %>%
html_table()
fdata = bind_rows(fdata, dtable)
len = length(data$address)
print(n/len*100)
n = n + 1
}
Upvotes: 0
Views: 47
Reputation: 8072
You can also use possibly
from purrr
to return NA
on errors, build a function to scrape your table then iterate and bind with map_dfr
library(purrr)
library(rvest)
read_possible <- posibly(read_html, NA)
scrape_table <- function(address) {
doc <- read_possible(address, encoding = 'UTF-8')
if (is.na(doc)) {
NA
} else {
html_table(doc)
}
}
map_dfr(data$address, scrape_table)
Upvotes: 1
Reputation: 5281
Simply adding a try
combined with if error next
will do, e.g.
fdata = data.frame()
n = 1
for (i in data$address) {
doc = try(read_html(i, encoding = 'UTF-8'), silent = TRUE)
if (any(class(doc) == 'try-error')) next
dtable = doc %>%
html_table()
fdata = bind_rows(fdata, dtable)
len = length(data$address)
print(n/len*100)
n = n + 1
}
Upvotes: 1