Reputation: 133
The goal here is to download a bunch of images, but some of the image URLs are broken. What I want to do is modify the code with a simple next statement so that if the link returns anything but status code 200 skip to the next URL (or if the link returns a 404 skip to the next), but I am not sure how to write this in vectorized code and when I try to write this in a for loop I cannot figure out how to initialize a vector of type "picture" to write to in the for loop. So now I am looking at the code for the function trying to figure out where the error gets called and where to put the next statement or something akin to it... if you cannot put a next statement in some form of vectorized code:
Simple Vectorized Code:
library(magick)
library(rsvg)
image_urls <- na.omit(articles$url_to_image)
image_content <- image_read(image_urls)
Opaque "Function" Code (Where does the error get called?---just a bunch of calls to downloading different types of images)
function (path, density = NULL, depth = NULL, strip = FALSE,
coalesce = TRUE, defines = NULL)
{
if (is.numeric(density))
density <- paste0(density, "x", density)
density <- as.character(density)
depth <- as.integer(depth)
#doesn't seem relevant: https://rdrr.io/cran/magick/src/R/defines.R
defines <- validate_defines(defines)
#test whether the object is an instance of an S4 class and a function to test inheritance relationships between object and class -- seems relevant maybe?
image <- if (isS4(path) && methods::is(path, "Image"))
{
#bioconductor class
convert_EBImage(path)
}
else if (inherits(path, "nativeRaster") || (is.matrix(path) &&
is.integer(path))) {
image_read_nativeraster(path)
}
else if (inherits(path, "cimg")) {
image_read_cimg((path))
}
else if (grDevices::is.raster(path)) {
image_read_raster2(path)
}
else if (is.matrix(path) && is.character(path)) {
image_read_raster2(grDevices::as.raster(path))
}
else if (is.array(path)) {
image_readbitmap(path)
}
else if (is.raw(path)) {
magick_image_readbin(path, density, depth, strip, defines)
}
else if (is.character(path) && all(nchar(path))) {
path <- vapply(path, replace_url, character(1))
path <- if (is_windows()) {
enc2utf8(path)
}
else {
enc2native(path)
}
magick_image_readpath(path, density, depth, strip, defines)
}
else {
stop("path must be URL, filename or raw vector")
}
if (is.character(path) && !isTRUE(magick_config()$rsvg)) {
if (any(grepl("\\.svg$", tolower(path))) || any(grepl("svg|mvg",
tolower(image_info(image)$format)))) {
warning("ImageMagick was built without librsvg which causes poor qualty of SVG rendering.\nFor better results use image_read_svg() which uses the rsvg package.",
call. = FALSE)
}
}
if (isTRUE(coalesce) && length(image) > 1 && identical("GIF",
toupper(image_info(image)$format[1]))) {
return(image_coalesce(image))
}
return(image)
}
When the link is broken it returns: Error in download_url(path) : Failed to download "link" (HTTP 404) when the URL is broken
Possible For Loop Code?
library(magick)
library(rsvg)
image_urls <- na.omit(articles$url_to_image)
image_content <- c() #doesn't work, nor does NULL
#nor does setting to typeof image_content <- image_url[1]
for(i in 1:length(image_urls){
image_content[i] = image_read(image_urls[i])
if(grepl('404', download_path(url), fixed = TRUE) == T)
next
}
But again, I cannot initialize, and I don't know if the loop will break before it gets to the if statement in any case.
Maybe there is another library I should use... or just another language?
Here is some sample data
data <- c("https://img-s-msn-com.akamaized.net/tenant/amp/entityid/AAOgEbG.img?h=488&w=799&m=6&q=60&o=f&l=f",
"https://img-s-msn-com.akamaized.net/tenant/amp/entityid/AAOh6FW.img?h=533&w=799&m=6&q=60&o=f&l=f",
"https://img-s-msn-com.net/tenant/amp/entityid/AAOgIFh.img?h=450&w=799&m=6&q=60&o=f&l=f&x=570&y")
Upvotes: 2
Views: 373
Reputation: 16978
You could try the try
function:
image_urls <- data
image_content <- lapply(seq_along(image_urls), function(i) try(image_read(image_urls[i])))
This stores your images in a list. Using
image_content[[1]]
gives you access to the first image. If there are errors like
Error in curl::curl_fetch_memory(url) :
Could not resolve host: img-s-msn-com.net simpleError in curl::curl_fetch_memory(url)
those are skipped and the loop proceeds to the next task.
Upvotes: 4
Reputation: 4243
Another option is to use purrr::safely
to create a "safe" version of image_read
which will return both result
and error
for each url.
Results can be extracted from the list using something like purrr::map(y,`[[`, 'result')
.
# two working links and one broken
urls <- c("https://img-s-msn-com.akamaized.net/tenant/amp/entityid/AAOgEbG.img?h=488&w=799&m=6&q=60&o=f&l=f",
"https://img-s-msn-com.akamaized.net/tenant/amp/entityid/AAOh6FW.img?h=533&w=799&m=6&q=60&o=f&l=f",
"https://img-s-msn-com.net/tenant/amp/entityid/AAOgIFh.img?h=450&w=799&m=6&q=60&o=f&l=f&x=570&y")
# create 'safe' function
image_read_safe <- purrr::safely(magick::image_read)
# apply 'safe' function
y <- purrr::map(urls, image_read_safe)
y
#> [[1]]
#> [[1]]$result
#> format width height colorspace matte filesize density
#> 1 JPEG 799 488 sRGB FALSE 39743 96x96
#>
#> [[1]]$error
#> NULL
#>
#>
#> [[2]]
#> [[2]]$result
#> format width height colorspace matte filesize density
#> 1 JPEG 799 533 sRGB FALSE 53910 96x96
#>
#> [[2]]$error
#> NULL
#>
#>
#> [[3]]
#> [[3]]$result
#> NULL
#>
#> [[3]]$error
#> <simpleError in curl::curl_fetch_memory(url): Could not resolve host: img-s-msn-com.net>
Created on 2021-09-10 by the reprex package (v2.0.0)
Upvotes: 2