Detecting invalid or corrupt jpg files with jpeg package in R

Question

I'd like to use the jpeg package (or similar) to detect corrupted .jpg files. I am sharing this code with users who have had trouble installing exiftool so I'd prefer to use packages that do not require that program.

I want my code to catch images that are completely corrupt or that are partially corrupt (i.e., you can see part of the image, but some of it is cut off).

When an image is corrupt, the readJPEG function returns:

Error in readJPEG(photos[35]) : 
  JPEG decompression error: Not a JPEG file: starts with 0x7b 0x28

When an image is partially corrupt, the function returns:

JPEG decompression: Corrupt JPEG data: premature end of data segment

I want to write a function that will return FALSE if the image is "good" and TRUE if it is corrupted or partially corrupted. So far, I can't get my function to work if the image is partially corrupted (it returns FALSE). What am I doing wrong?

Here's an example of a "partially corrupt" image - the bottom half got cut off when it was transferred to a new device.

library(jpeg)

    # Function to "catch" bad photos
is_corrupted <- function(x){
  tryCatch({
    check <- readJPEG(x)
    return(FALSE)
    },
    error = function(e)
      return(TRUE),
    warning = function(w)
      return(TRUE),
    message = function(m)
      return(TRUE)
    )
}

EDIT: Try number 2...

I created a modified function based on Ben's suggestions, but it still isn't returning TRUE if an image is completely corrupt. I also don't like how it tests the photo twice. Any recommendations appreciated!

To test the function, you can use three jpgs... (1) any valid jpg from your computer, (2) the "partially corrupt" file linked in this question, and (3) reference a file that doesn't exist to throw an error that will be caught by tryCatch (e.g., is_corrupted("").

is_corrupted <- function(x){
message <- capture.output(check2 <- readJPEG(x), type = "message")
if(length(message) > 0) {
  corrupt <- TRUE
} else {
corrupt <- tryCatch({
    check <- readJPEG(x)
    return(FALSE)
  },
  error = function(e) # catch "corrupt" images
    return(TRUE)
  ) 
}
return(corrupt)
}

Ben Nutzer · Accepted Answer

I agree, this one is tricky. I think you need to have the error checking before the capturing part. I will post a temporary (ugly) solution, and hopefully someone else posts a more elegant and straightforward one.

readJPEG2 <- purrr::safely(readJPEG)

Let purrr do the error checking and if there is none, proceed with examining the output:

fun <- function(x){
          if(is.null(readJPEG2(x)$error)){
                    message2 <- capture.output(readJPEG(x), type = "message")
                    if(length(message2) > 0){
                              return("partially corrupted")
                    } else {
                              return("complete")
                    }
          } else {
                    return("corrupted")
          }

}

I do not know how robust this solution is but maybe it helps you even so.

Detecting invalid or corrupt jpg files with jpeg package in R

EDIT: Try number 2...

Answers (1)

Related Questions