Reputation: 169
I have a folder of 90 million images. I have a table with a column for file_path
with the full path to each file. I need to read the image height and width for all every image in the folder. And save the result as a table that can be read into R.
I have tried using the exifr
package in R (uses exiftool
to do the work) but it is slow (>3 hour to scan my folder). Is there a faster way to achieve my goal? I am not tied to using native R
functions, but would like to be able to call the tool from R
if in a different language using system
or system2
.
library(exifr)
dat<-data.frame(file_path = list.files("path/to/folder",
pattern =".jpg$",
full.names = TRUE,
recursive = TRUE))
im_dims<-read_exif(dat$file_path,tags = c("ExifImageWidth", "ExifImageHeight"))
Upvotes: 3
Views: 1035
Reputation: 76565
I don't know how fast ImageMagick is when compared with exifr
but it's the fastest of the 3 options below. The R function I include comes from this SO answer and is included to make reprex
happy. Note that I have 33 files in the folder Pictures
with a total size of 3.7 MB.
library(jpeg)
get_image_dimensions <- function(path) {
# Ensure file exists
if(!file.exists(path))
stop("No file found", call. = FALSE)
# Ensure file ends with .png or .jpg or jpeg
if (!grepl("\\.(png|jpg|jpeg)$", x = path, ignore.case = TRUE))
stop("File must end with .png, .jpg, or .jpeg", call. = FALSE)
# Get return of file system command
s <- system(paste0("file ", path), intern = TRUE)
# Extract width and height from string
width <- regmatches(s, gregexpr("(?<=, )[0-9]+(?=(x| x )[0-9]+,)", s, perl = TRUE))[[1]]
height <- regmatches(s, gregexpr(", [0-9]+(x| x )\\K[0-9]+(?=,)", s, perl = TRUE))[[1]]
setNames(as.numeric(c(width, height)), c("Width", "Height"))
}
magick_dim <- function(x, path = "."){
fls <- list.files(path = path, pattern = x, full.names = TRUE)
cmd <- 'magick'
args <- c('identify', '-format', '"%w %h\n"', fls)
d <- system2(cmd, args, stdout = TRUE)
d <- strsplit(d, " ")
y <- lapply(d, as.integer)
setNames(y, basename(fls))
}
magick_dim("\\.jpg")
#> named list()
od <- getwd()
setwd("~/Rui/Pictures")
fls <- list.files(pattern = "\\.jpg")
length(fls)
#> [1] 33
library(microbenchmark)
mb <- microbenchmark(
readJPEG = lapply(fls, \(x) dim(readJPEG(x))),
Colitti = lapply(fls, get_image_dimensions),
magick = magick_dim("\\.jpg"),
times = 5
)
print(mb, order = "median")
#> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> magick 896.0296 983.3561 1037.211 992.1392 1115.144 1199.387 5 a
#> readJPEG 2252.2964 2346.6609 2510.984 2350.2388 2572.611 3033.112 5 b
#> Colitti 7271.8500 7382.9254 7540.919 7618.5121 7692.957 7738.351 5 c
Created on 2022-03-29 by the reprex package (v2.0.1)
Upvotes: 6