Reputation: 10643
How would i check that a file is either jpeg, pdf or tiff? And I mean actually checking, not just from mime type and file extension.
I have access to the raw file data (this check is part of an uploader) and i need to verify that the files are either jpeg, pdf or tiff. I assume I would have to check for some sort of headers in the files but I have no idea what to look for and where to start.
Upvotes: 7
Views: 3715
Reputation: 11730
There is no sure fired way to be certain but the first few binary bits of a file are its signature/fingerprint for the file handlers to test. see https://en.wikipedia.org/wiki/List_of_file_signatures
Every file type can vary considerably and some allow for variable / shifting headers, but with a degree of uncertainty (At one time PDF did not mandate the 40 bit signature to be first) we can assume the following hex values sometimes erroneously called "Magic Numbers" as representing the start of each bit stream.
So in general to answer the requested types
/9j/4
in Base64 formatJVBER
in Base64 formatiVBOR
in Base64 formatjust for good measure here is related older GIF sequence
R0lGO
as Base64 also we can see the first 8 bits are 01000111 for G
Thus in ALL the above cases just the first "8 bit / byte" would be a very good indicator, no need for Magic strings, but with Zip/###X such as docX pptX cbzX xlsX they ALL have the same Magic Number
UEsDB
Finally the last requested above was Tif(f) which can be two types, Intel or Motorola thus you need to test for
SUkqA
TU0AK
Upvotes: 0
Reputation: 6356
You need to implement byte sequence tests.
Here is a guide to checking byte sequences for the most common image formats.
Upvotes: 1
Reputation: 10841
Exif_imagetype is very useful for this: https://www.php.net/manual/en/function.exif-imagetype.php
It scans the initial bytes of the file to determine the graphic type. It supports a large number of graphic formats (and returns false if it doesn't recognize the format).
Upvotes: 3
Reputation: 5337
to check for image types you can use the exif_imagetype function. for pdf: you have to open the file and read the first bytes and look if it starts with '%PDF'
$fp = fopen($pdf, 'r');
if(fgets($fp, 4) == '%PDF')
{
... is pdf
}
fclose($fp);
Upvotes: 0
Reputation: 3806
This can be tricky since all files must follow a certain kind of ISO standard with the "magical number" present, which basically is a "header" for the format.
I found this wiki-page about different signatures: http://en.wikipedia.org/wiki/List_of_file_signatures
So in the best case scenario you just need to validate these first bytes.
Upvotes: 1
Reputation: 1090
If you have access to the raw file, you can check the file header for its magic number. This number define the type of file.
Upvotes: 0