Rajesh Patrick Que
Rajesh Patrick Que

Reputation: 75

Loop through a list of URLs and retain only those that exist

I am in a situation where I need to check 1000s of URLs to see if they exist, and retain the urls in an object if they exist (i am calling it "exist") and store those that don't in another object ("not exist").

urls <- paste0("https://en.wikipedia.org/wiki/List_of_countries_by_population_in_",1990:2020)

for(i in seq_along(urls)){
     exist <- keep(urls[i], http_error(urls[i]))
     not_exist <- discard(urls[i], httpe_error(url[i])
}

I want to avoid using the loop and just stick to the purrr functions, I tried doing

exists <- map(urls,http_error)

But this just returns true/False.

My eventual goal is to create a table with two columns titled "exists" and "not exists" as follows:

\begin{table}[] \begin{tabular}{lll} Exists & Not Exists &  \\ 
URL    &  URL &  \\ 
URL    & URL  &  \\ 
URL    & URL  &  \\ 
URL    &      &  \\
URL    &      &  \\ 
URL    &      &  
\end{tabular} \end{table} 

Upvotes: 2

Views: 290

Answers (2)

luismf
luismf

Reputation: 371

correct_urls <- purrr::keep(urls, ~!httr::http_error(.x))
incorrect_urls <- purrr::keep(urls, ~httr::http_error(.x))

Or other option if you don't want to call the http_error twice

url_error <- purrr::map_lgl(urls, httr::http_error)
incorrect_urls <- urls[url_error]
correct_urls <- urls[!url_error]

Then you can put those urls in the LaTeX table

Upvotes: 1

Rajesh Patrick Que
Rajesh Patrick Que

Reputation: 75

I came up with a rather clunky solution, it works but scores very low on efficiency and elegance.

urls <- paste0("https://en.wikipedia.org/wiki/List_of_countries_by_population_in_",1990:2020)
library(httr)

safe_url_logical <- map(urls, http_error)

temp <- cbind(unlist(safe_url_logical), unlist(urls))

colnames(temp) <- c("logical","url")
temp <- as.data.frame(temp)

safe_urls <- temp %>% 
     dplyr::filter(logical=="FALSE")
dead_urls <- temp %>% 
     dplyr::filter(logical=="TRUE")

Upvotes: 0

Related Questions