Lay András
Lay András

Reputation: 855

Extract thumbnail from jpeg file

I'd like to extract thumbnail image from jpegs, without any external library. I mean this is not too difficult, because I need to know where the thumbnail starts, and ends in the file, and simply cut it. I study many documentation ( ie.: http://www.media.mit.edu/pia/Research/deepview/exif.html ), and try to analyze jpegs, but not everything clear. I tried to track step by step the bytes, but in the deep I confused. Is there any good documentation, or readable source code to extract the info about thumbnail start and end position within a jpeg file?

Thank you!

Upvotes: 28

Views: 59853

Answers (4)

Riot
Riot

Reputation: 16706

Exiftool is very capable of doing this quickly and easily:

exiftool -b -ThumbnailImage my_image.jpg > my_thumbnail.jpg

Upvotes: 36

BitBank
BitBank

Reputation: 8715

For most JPEG images created by phones or digital cameras, the thumbnail image (if present) is stored in the APP1 marker (FFE1). Inside this marker segment is a TIFF file containing the EXIF information for the main image and the optional thumbnail image stored as a JPEG compressed image. The TIFF file usually contains two "pages" where the first page is the EXIF info and the second page is the thumbnail stored in the "old" TIFF type 6 format. Type 6 format is when a JPEG file is just stored as-is inside of a TIFF wrapper. If you want the simplest possible code to extract the thumbnail as a JFIF, you will need to do the following steps:

  1. Familiarize yourself with JFIF and TIFF markers/tags. JFIF markers consist of two bytes: 0xFF followed by the marker type (0xE1 for APP1). These two bytes are followed by the two-byte length stored in big-endian order. For TIFF files, consult the Adobe TIFF 6.0 reference.
  2. Search your JPEG file for the APP1 (FFE1) EXIF marker. There may be multiple APP1 markers and there may be multiple markers before the APP1.
  3. The APP1 marker you're looking for contains the letters "EXIF" immediately after the length field.
  4. Look for "II" or "MM" (6 bytes away from length) to indicate the endianness used in the TIFF file. II = Intel = little endian, MM = Motorola = big endian.
  5. Skip through the first page's tags to find the second IFD where the image is stored. In the second "page", look for the two TIFF tags which point to the JPEG data. Tag 0x201 has the offset of the JPEG data (relative to the II/MM) and tag 0x202 has the length in bytes.

Upvotes: 23

Joel
Joel

Reputation: 15752

There is a much simpler solution for this problem, but I don't know how reliable it is: Start reading the JPEG file from the third byte and search for FFD8 (start of JPEG image marker), then for FFD9 (end of JPEG image marker). Extract it and voila, that's your thumbnail.

A simple JavaScript implementation:

function getThumbnail(file, callback) {
    if (file.type == "image/jpeg") {
        var reader = new FileReader();
        reader.onload = function (e) {
            var array = new Uint8Array(e.target.result),
                start, end;
            for (var i = 2; i < array.length; i++) {
                if (array[i] == 0xFF) {
                    if (!start) {
                        if (array[i + 1] == 0xD8) {
                            start = i;
                        }
                    } else {
                        if (array[i + 1] == 0xD9) {
                            end = i;
                            break;
                        }
                    }
                }
            }
            if (start && end) {
                callback(new Blob([array.subarray(start, end)], {type:"image/jpeg"}));
            } else {
                // TODO scale with canvas
            }
        }
        reader.readAsArrayBuffer(file.slice(0, 50000));
    } else if (file.type.indexOf("image/") === 0) {
        // TODO scale with canvas
    }
}

Upvotes: 8

Samveen
Samveen

Reputation: 3540

The wikipedia page on JFIF at http://en.wikipedia.org/wiki/JPEG_File_Interchange_Format gives a good description of the JPEG Header(the header contains the thumbnail as an uncompressed raster image). That should give you an idea of the layout and thus the code needed to extract the info.

Hexdump of an image header (little endian display):

sdk@AndroidDev:~$ head -c 48 stfu.jpg |hexdump
0000000 d8ff e0ff 1000 464a 4649 0100 0101 4800
0000010 4800 0000 e1ff 1600 7845 6669 0000 4d4d
0000020 2a00 0000 0800 0000 0000 0000 feff 1700

Image Magic (bytes 1,0), App0 Segment header Magic(bytes 3,2), Header Length (5,4) Header Type signature ("JFIF\0"||"JFXX\0")(bytes 6-10), Version (bytes 11,12) Density units (byte 13), X Density (bytes 15,14), Y Density (bytes 17,16), Thumbnail width (byte 19), Thumbnail height (byte 18), and finally rest up to "Header Length" is thumbnail data.

From the above example, you can see that the header length is 16 bytes (bytes 6,5) and version is 01.01 (bytes 12,13). Further, as Thumbnail Width and Thumbnail Height are both 0x00, the image doesn't contain a thumbnail.

Upvotes: -1

Related Questions