Nick Sherman
Nick Sherman

Reputation: 21

In R, image processing loop takes an order of magnitude longer to process after ~50 iterations

I have 400+ jpeg images in a folder. I'm trying write a script to: read each image, identify some text in the image, and then write the file name and that text into a data frame.

When I run the script below, the first ~50 iterations print a time of .1-.3 seconds. Then, for a few iterations, the iteration will take 1-3 seconds. Then, this bumps up to 1-5 minutes, after which I kill the script.

library(dplyr)
library(magick)

fileList3 = list.files(path = filePath)

printJobXRes = data.frame(
                          jobName = as.character(),
                          xRes = as.numeric(),
                          stringsAsFactors = FALSE
                          )
i = 0

for (fileName in fileList3){
  img = paste0(filePath, '/', fileName, '_TestImage.jpg')
  start_time = Sys.time()
  temp.xRes = image_read(img, strip = T) %>% 
    image_rotate(270) %>% 
    image_crop('90x150+1750') %>% 
    image_negate %>%
    image_convert(type = 'Bilevel') %>%
    image_ocr %>%
    as.numeric
  
  stop_time = Sys.time()
  i = i+1
  print(paste(fileName,'first attempt, item #', i))
  print(stop_time-start_time)
  
  temp.df3 = data.frame(
    jobName = fileName,
    xRes = temp.xRes,
    stringsAsFactors = FALSE
    )
  printJobXRes = rbind(printJobXRes, temp.df3)
  rm(temp.xRes)
  rm(temp.df3)
}

Here's a couple lines of the output:

#Images 1-49 process in .1-.3 seconds each
[1] "Image50.jpg first attempt, item # 50"
Time difference of 0.2320111 secs
[1] "Image51.jpg first attempt, item # 51"
Time difference of 0.213742 secs
[1] "Image52.jpg first attempt, item # 52"
Time difference of 0.2536581 secs
[1] "Image53.jpg first attempt, item # 53"
Time difference of 1.253844 secs
[1] "Image54.jpg first attempt, item # 54"
Time difference of 1.149764 secs
[1] "Image55.jpg first attempt, item # 55"
Time difference of 1.171134 secs
[1] "Image56.jpg first attempt, item # 56"
Time difference of 1.397093 secs
[1] "Image57.jpg first attempt, item # 57"
Time difference of 1.201915 secs
[1] "Image58.jpg first attempt, item # 58"
Time difference of 1.455768 secs
[1] "Image59.jpg first attempt, item # 59"
Time difference of 1.618744 secs
[1] "Image60.jpg first attempt, item # 60" 
Time difference of 4.527751 mins

Can anyone offer suggestions as to why the loop doesn't continue to take ~.1-.3 seconds? All jpgs are roughly the same size, resolution, and all generated from the same source.

EDIT: changed the image names from Image1-Image11 to Image50-Image60 for clarity.

Upvotes: 1

Views: 115

Answers (1)

Nick Sherman
Nick Sherman

Reputation: 21

I was able to solve my issue based on Mark's suggestion. I was removing the image file from memory in each loop iteration, but the freed up memory was never realized by R. I added a garbage collection command (gc()) into the loop to fix this issue, and the loop then ran as expected.

Upvotes: 1

Related Questions