Prometheus
Prometheus

Reputation: 2017

R Scraping - skip HTML error 500 in loop

I'm scraping a table.

dput(head(temp_data))
structure(list(link = c("http://ujp.gov.mk/mk/prebaruvanje_pravni_lica/prikazi?edb=MK4030998342636", 
"http://ujp.gov.mk/mk/prebaruvanje_pravni_lica/prikazi?edb=MK4030998342636", 
"http://ujp.gov.mk/mk/prebaruvanje_pravni_lica/prikazi?edb=MK4030998378860", 
"http://ujp.gov.mk/mk/prebaruvanje_pravni_lica/prikazi?edb=MK4030998346429", 
"http://ujp.gov.mk/mk/prebaruvanje_pravni_lica/prikazi?edb=MK4030998346429", 
"http://ujp.gov.mk/mk/prebaruvanje_pravni_lica/prikazi?edb=MK4030998346429"
)), .Names = "link", row.names = c(NA, 6L), class = "data.frame")

My code:

new_function <- function() {


for (i in 1:nrow(temp_data)) {

  temp_data_point <- temp_data[i, ]
  file <- read_html(temp_data_point)
  tables <- html_nodes(file, "table")
  table1 <- html_table(tables[8], fill = TRUE)
  table2 <- as.data.frame(table1)
  table2 <- table2[15:24 , 1:2]


  colnames(table2)[1] <- "variables"
  colnames(table2)[2] <- "results"


  table2[1, 1] <- "name"
  table2[2, 1] <- "legal_form"
  table2[3, 1] <- "industry"
  table2[4, 1] <- "tax_num"
  table2[5, 1] <- "id"
  table2[6, 1] <- "account_num"
  table2[7, 1] <- "bank_name"
  table2[8, 1] <- "address"
  table2[9, 1] <- "location"
  table2[10, 1] <- "phone"

  test2 <- spread(table2, variables, results)
  temp_table3[i, ] <- test2

}

return(temp_table3)

}

The problem arises when one of the URL's does not contain a table. For example:

With the non-working link I get:

Error in open.connection(x, "rb") : HTML error 500

Any ideas how I can implement an if statement, that checks whether the link contains the table, and if not, skip to the next iteration? Perhaps a TryCatch?

Upvotes: 0

Views: 1392

Answers (1)

Aleksandr
Aleksandr

Reputation: 1914

Lets say you have empty table link. First check the request status code.

library(httr)
r = GET("http://www.ujp.gov.mk/mk/prebaruvanje_pravni_lica/prikazi?edb=MK4019999105375")
status = status_code(r)

Then use conditional statement. If status code not equal to 500 go ahead and parse table. Else, jump to next iteration.

if(status != 500){
  # parse table
}
else{
  next # jump to next iteration
}

Upvotes: 3

Related Questions