Reputation: 2017
I'm scraping a table.
dput(head(temp_data))
structure(list(link = c("http://ujp.gov.mk/mk/prebaruvanje_pravni_lica/prikazi?edb=MK4030998342636",
"http://ujp.gov.mk/mk/prebaruvanje_pravni_lica/prikazi?edb=MK4030998342636",
"http://ujp.gov.mk/mk/prebaruvanje_pravni_lica/prikazi?edb=MK4030998378860",
"http://ujp.gov.mk/mk/prebaruvanje_pravni_lica/prikazi?edb=MK4030998346429",
"http://ujp.gov.mk/mk/prebaruvanje_pravni_lica/prikazi?edb=MK4030998346429",
"http://ujp.gov.mk/mk/prebaruvanje_pravni_lica/prikazi?edb=MK4030998346429"
)), .Names = "link", row.names = c(NA, 6L), class = "data.frame")
My code:
new_function <- function() {
for (i in 1:nrow(temp_data)) {
temp_data_point <- temp_data[i, ]
file <- read_html(temp_data_point)
tables <- html_nodes(file, "table")
table1 <- html_table(tables[8], fill = TRUE)
table2 <- as.data.frame(table1)
table2 <- table2[15:24 , 1:2]
colnames(table2)[1] <- "variables"
colnames(table2)[2] <- "results"
table2[1, 1] <- "name"
table2[2, 1] <- "legal_form"
table2[3, 1] <- "industry"
table2[4, 1] <- "tax_num"
table2[5, 1] <- "id"
table2[6, 1] <- "account_num"
table2[7, 1] <- "bank_name"
table2[8, 1] <- "address"
table2[9, 1] <- "location"
table2[10, 1] <- "phone"
test2 <- spread(table2, variables, results)
temp_table3[i, ] <- test2
}
return(temp_table3)
}
The problem arises when one of the URL's does not contain a table. For example:
With the non-working link I get:
Error in open.connection(x, "rb") : HTML error 500
Any ideas how I can implement an if statement, that checks whether the link contains the table, and if not, skip to the next iteration? Perhaps a TryCatch?
Upvotes: 0
Views: 1392
Reputation: 1914
Lets say you have empty table link. First check the request status code.
library(httr)
r = GET("http://www.ujp.gov.mk/mk/prebaruvanje_pravni_lica/prikazi?edb=MK4019999105375")
status = status_code(r)
Then use conditional statement. If status code not equal to 500 go ahead and parse table. Else, jump to next iteration.
if(status != 500){
# parse table
}
else{
next # jump to next iteration
}
Upvotes: 3