Abram Fleishman
Abram Fleishman

Reputation: 169

Is there a fast method to read image dimensions (W x H) in R

I have a folder of 90 million images. I have a table with a column for file_path with the full path to each file. I need to read the image height and width for all every image in the folder. And save the result as a table that can be read into R.

I have tried using the exifr package in R (uses exiftool to do the work) but it is slow (>3 hour to scan my folder). Is there a faster way to achieve my goal? I am not tied to using native R functions, but would like to be able to call the tool from R if in a different language using system or system2.

library(exifr)
dat<-data.frame(file_path = list.files("path/to/folder", 
                                       pattern =".jpg$",
                                       full.names = TRUE,
                                       recursive = TRUE))

im_dims<-read_exif(dat$file_path,tags = c("ExifImageWidth", "ExifImageHeight"))

Upvotes: 3

Views: 1035

Answers (1)

Rui Barradas
Rui Barradas

Reputation: 76565

I don't know how fast ImageMagick is when compared with exifr but it's the fastest of the 3 options below. The R function I include comes from this SO answer and is included to make reprex happy. Note that I have 33 files in the folder Pictures with a total size of 3.7 MB.

library(jpeg)

get_image_dimensions <- function(path) {
  # Ensure file exists
  if(!file.exists(path)) 
    stop("No file found", call. = FALSE)
  
  # Ensure file ends with .png or .jpg or jpeg
  if (!grepl("\\.(png|jpg|jpeg)$", x = path, ignore.case = TRUE))
    stop("File must end with .png, .jpg, or .jpeg", call. = FALSE)
  
  # Get return of file system command
  s <- system(paste0("file ", path), intern = TRUE)
  
  # Extract width and height from string
  width <- regmatches(s, gregexpr("(?<=, )[0-9]+(?=(x| x )[0-9]+,)", s, perl = TRUE))[[1]]
  height <- regmatches(s, gregexpr(", [0-9]+(x| x )\\K[0-9]+(?=,)", s, perl = TRUE))[[1]] 
  setNames(as.numeric(c(width, height)), c("Width", "Height"))
}

magick_dim <- function(x, path = "."){
  fls <- list.files(path = path, pattern = x, full.names = TRUE)
  cmd <- 'magick'
  args <- c('identify', '-format', '"%w %h\n"', fls)
  d <- system2(cmd, args, stdout = TRUE)
  d <- strsplit(d, " ")
  y <- lapply(d, as.integer)
  setNames(y, basename(fls))
}

magick_dim("\\.jpg")
#> named list()

od <- getwd()
setwd("~/Rui/Pictures")
fls <- list.files(pattern = "\\.jpg")

length(fls)
#> [1] 33

library(microbenchmark)

mb <- microbenchmark(
  readJPEG = lapply(fls, \(x) dim(readJPEG(x))),
  Colitti = lapply(fls, get_image_dimensions),
  magick = magick_dim("\\.jpg"),
  times = 5
)
print(mb, order = "median")
#> Unit: milliseconds
#>      expr       min        lq     mean    median       uq      max neval cld
#>    magick  896.0296  983.3561 1037.211  992.1392 1115.144 1199.387     5 a  
#>  readJPEG 2252.2964 2346.6609 2510.984 2350.2388 2572.611 3033.112     5  b 
#>   Colitti 7271.8500 7382.9254 7540.919 7618.5121 7692.957 7738.351     5   c

Created on 2022-03-29 by the reprex package (v2.0.1)

Upvotes: 6

Related Questions