Reputation: 75
I am in a situation where I need to check 1000s of URLs to see if they exist, and retain the urls in an object if they exist (i am calling it "exist") and store those that don't in another object ("not exist").
urls <- paste0("https://en.wikipedia.org/wiki/List_of_countries_by_population_in_",1990:2020)
for(i in seq_along(urls)){
exist <- keep(urls[i], http_error(urls[i]))
not_exist <- discard(urls[i], httpe_error(url[i])
}
I want to avoid using the loop and just stick to the purrr functions, I tried doing
exists <- map(urls,http_error)
But this just returns true/False.
My eventual goal is to create a table with two columns titled "exists" and "not exists" as follows:
\begin{table}[] \begin{tabular}{lll} Exists & Not Exists & \\
URL & URL & \\
URL & URL & \\
URL & URL & \\
URL & & \\
URL & & \\
URL & &
\end{tabular} \end{table}
Upvotes: 2
Views: 290
Reputation: 371
correct_urls <- purrr::keep(urls, ~!httr::http_error(.x))
incorrect_urls <- purrr::keep(urls, ~httr::http_error(.x))
Or other option if you don't want to call the http_error twice
url_error <- purrr::map_lgl(urls, httr::http_error)
incorrect_urls <- urls[url_error]
correct_urls <- urls[!url_error]
Then you can put those urls in the LaTeX table
Upvotes: 1
Reputation: 75
I came up with a rather clunky solution, it works but scores very low on efficiency and elegance.
urls <- paste0("https://en.wikipedia.org/wiki/List_of_countries_by_population_in_",1990:2020)
library(httr)
safe_url_logical <- map(urls, http_error)
temp <- cbind(unlist(safe_url_logical), unlist(urls))
colnames(temp) <- c("logical","url")
temp <- as.data.frame(temp)
safe_urls <- temp %>%
dplyr::filter(logical=="FALSE")
dead_urls <- temp %>%
dplyr::filter(logical=="TRUE")
Upvotes: 0