Reputation: 265
I've made a script that extracts JPGs from any file using JPEG magic numbers (data starts with FFD8, ends with FFD9).
However it's not enough, as a lot of data segments with JPEG magic numbers are not actual JPEGs, just other random bits of data, and will throw an error if you attempt to open them as JPEGs.
What are some additional byte checks that can be done to verify the validity of a JPEG file (markers that will exist in EVERY jpeg).
Upvotes: 0
Views: 3771
Reputation: 265
A very reliable check I found was to find the start-of-frame byte marker (full table here).
ff c0
or ff c2
.
From there collect bytes until the header ends (with the start of a definition) ff c4
, ff db
, or ff dd
.
Now you have a frame header. The first two bytes tell you the length of the header. Confirm that the number of bytes collected matches the value of the first two bytes. The header length is usually 16.
If the length between ff c0 || ff c2
and ff c4 || ff db || ff dd
does not match the value of the first two bytes, or if no such byte sequence is found, then its not a valid JPEG.
Upvotes: 0
Reputation: 21617
The answer depends upon the level of checking you want to do. Every JPEG stream should have an SOI marker at the start and an EOI marker at the end. In theory there could be data after the EOI market that is outside the JPEG image.
The next level down, you could check if there is an SOFx marker. There should be just one.
Then you could make sure there are sufficient SOS markers. In a sequential JPEG, there should be on per component. For a progressive JPEG, you need to do quite a bit more checking.
Then you could check that all the DHT and DQT markers required by the SOS markers are present.
Finally, you could check the scan data, which requires decompressing the image.
Upvotes: 3