Reputation: 21
I have 400+ jpeg images in a folder. I'm trying write a script to: read each image, identify some text in the image, and then write the file name and that text into a data frame.
When I run the script below, the first ~50 iterations print a time of .1-.3 seconds. Then, for a few iterations, the iteration will take 1-3 seconds. Then, this bumps up to 1-5 minutes, after which I kill the script.
library(dplyr)
library(magick)
fileList3 = list.files(path = filePath)
printJobXRes = data.frame(
jobName = as.character(),
xRes = as.numeric(),
stringsAsFactors = FALSE
)
i = 0
for (fileName in fileList3){
img = paste0(filePath, '/', fileName, '_TestImage.jpg')
start_time = Sys.time()
temp.xRes = image_read(img, strip = T) %>%
image_rotate(270) %>%
image_crop('90x150+1750') %>%
image_negate %>%
image_convert(type = 'Bilevel') %>%
image_ocr %>%
as.numeric
stop_time = Sys.time()
i = i+1
print(paste(fileName,'first attempt, item #', i))
print(stop_time-start_time)
temp.df3 = data.frame(
jobName = fileName,
xRes = temp.xRes,
stringsAsFactors = FALSE
)
printJobXRes = rbind(printJobXRes, temp.df3)
rm(temp.xRes)
rm(temp.df3)
}
Here's a couple lines of the output:
#Images 1-49 process in .1-.3 seconds each
[1] "Image50.jpg first attempt, item # 50"
Time difference of 0.2320111 secs
[1] "Image51.jpg first attempt, item # 51"
Time difference of 0.213742 secs
[1] "Image52.jpg first attempt, item # 52"
Time difference of 0.2536581 secs
[1] "Image53.jpg first attempt, item # 53"
Time difference of 1.253844 secs
[1] "Image54.jpg first attempt, item # 54"
Time difference of 1.149764 secs
[1] "Image55.jpg first attempt, item # 55"
Time difference of 1.171134 secs
[1] "Image56.jpg first attempt, item # 56"
Time difference of 1.397093 secs
[1] "Image57.jpg first attempt, item # 57"
Time difference of 1.201915 secs
[1] "Image58.jpg first attempt, item # 58"
Time difference of 1.455768 secs
[1] "Image59.jpg first attempt, item # 59"
Time difference of 1.618744 secs
[1] "Image60.jpg first attempt, item # 60"
Time difference of 4.527751 mins
Can anyone offer suggestions as to why the loop doesn't continue to take ~.1-.3 seconds? All jpgs are roughly the same size, resolution, and all generated from the same source.
EDIT: changed the image names from Image1-Image11 to Image50-Image60 for clarity.
Upvotes: 1
Views: 115
Reputation: 21
I was able to solve my issue based on Mark's suggestion. I was removing the image file from memory in each loop iteration, but the freed up memory was never realized by R. I added a garbage collection command (gc()) into the loop to fix this issue, and the loop then ran as expected.
Upvotes: 1