Tim Visher
Tim Visher

Reputation: 12854

How can I determine if a file is an image on the JVM?

I'd like to get the contents of a directory that contains arbitrary files (a typical "Downloads" directory) and determine programmatically if a given file is an Image of any type.

I'm working in Clojure but anything available on the JVM is fair game.

Thanks in advance!

Upvotes: 3

Views: 797

Answers (3)

Jake McCrary
Jake McCrary

Reputation: 1180

Ended up being able to solve this by combining the comment on your question along with my earlier answer from here. Minor change to the code let it work with images that are not images.

I didn't change it to recurse to sub directories. Would be easy enough to do.

(defn files-in-dir [dir]                                                                                                               
  (filter #(not (.isDirectory %))                                                                                                      
          (.listFiles (java.io.File. dir))))                                                                                           

(defn figure-out-height-width                                                                                                          
  [files]                                                                                                                              
  (remove nil?                                                                                                                         
          (map (fn [file]                                                                                                               
                 (with-open [r (java.io.FileInputStream. file)]                                                                        
                   (if-let [img (javax.imageio.ImageIO/read r)]                                                                        
                     [file (.getWidth img) (.getHeight img)])))                                                                        
               files)))                                                                                                                

user> (pprint (files-in-dir "/home/jmccrary/Downloads/"))                                                                              
(#<File /home/jmccrary/Downloads/Girl_Talk_-_All_Day_(IA123)_mp3s.zip>                                                                 
 #<File /home/jmccrary/Downloads/CSS3-for-Web-Designers.zip>                                                                           
 #<File /home/jmccrary/Downloads/manual.pdf>                                                                                           
 #<File /home/jmccrary/Downloads/test.jpeg>                                                                                            
 #<File /home/jmccrary/Downloads/nautilus-dropbox_0.6.7_amd64.deb>                                                                     
 #<File /home/jmccrary/Downloads/rubygems-1.3.7.tgz>                                                                                   
 #<File /home/jmccrary/Downloads/HTML5-FOR-WEB-DESIGNERS.zip>                                                                          
 #<File /home/jmccrary/Downloads/bcompare-3.1.11.12238.tar.gz>                                                                         
 #<File /home/jmccrary/Downloads/shared_ptr_example.cpp>)                                                                              
nil                                                                                                                                    
user> (figure-out-height-width (files-in-dir "/home/jmccrary/Downloads"))                                                              
([#<File /home/jmccrary/Downloads/test.jpeg> 32 32])

After thinking about it for a bit it feels dirty to combine the check for a file being an image with the pulling out the width and height. Alternatively you could define a function which does this filtering separately and gives you a seq of images.

(defn filter-images                                                                                                                    
  [files]                                                                                                                              
  (reduce (fn [res file]                                                                                                                
            (if-let [img (javax.imageio.ImageIO/read file)]                                                                            
              (conj res img)                                                                                                           
              res))                                                                                                                    
          []                                                                                                                           
          files))

user> (filter-images (files-in-dir "/home/jmccrary/Downloads"))                                                                        
[#<BufferedImage BufferedImage@24753433: type = 5 ColorModel: #pixelBits = 24 numComponents = 3 color space = java.awt.color.ICC_Color\
Space@43036651 transparency = 1 has alpha = false isAlphaPre = false ByteInterleavedRaster: width = 32 height = 32 #numDataElements 3 \
dataOff[0] = 2>

]

Upvotes: 3

Alex Ott
Alex Ott

Reputation: 87069

You can use the Tika library that is able to detect many types of files, and also extract metadata from many of them. I have very simple Clojure wrapper for it

Upvotes: 4

eaj
eaj

Reputation: 2606

Obviously, the easiest thing to do is look at the filename extension. Of course, it's not necessarily reliable, but it may suffice in some circumstances.

Short of reading the whole image, you could read the first few bytes of the file to identify it by its "magic number". For example, JPEG files always start with the two bytes 0xFFD8 and end with 0xFFD9; PDFs always begin with the string "%PDF".

This saves you the overhead of creating an image in memory, and might speed up your I/O as well (since you only need a few bytes of the file).

If you don't want to research all these magic numbers yourself, you could try a library such as jMimeMagic. I've never used it, so I can't vouch for its quality or completeness, but it is LGPL. I'm sure you can find other alternatives as well.

Upvotes: 3

Related Questions