Reputation: 1
all. I am trying to write a C++ program that will iterate through a user-specified directory (e.g. /home/alpernick/Pictures). Primarily, this is to ensure that there are no duplicates (checked via md5sum).
But one feature I truly want to include is to ensure that the extension of a filename matches the file's type.
For example, if the file's name is "sunrise.png" I want to ensure that it actually is indeed a PNG and not a mislabeled JPEG (for example).
I am approaching this with four functions, as follows.
string extension(string fileName) // returns the extension of fileName (including .tar.gz handling, so it isn't blindly just returning the last 3 characters)
string fileType(string fileName) // This one is the key -- it returns the actual file type, so if the file named fileName is a PNG, fileType() will return PNG, regardless of the return value of extension()
string basename(string fileName) // Rerturns the basename of the file, I.e. everything before the extension (so, for sunset.jpg, it would return sunset; for fluffytarball,tar.gz, it would return fluffytarball)
string renameFile(string incorrectFileName, string fileNameBeforeExtension, string actualFileType) // Returns a string whose value is the basename concatenated with the correct file extension.
string file = sunset.jpg;
/* Setting file to be hard-coded for illustrative purposes only */
if(extension(file) != fileType(file)
{
char fixedName [] = renameFile(file, basename(file), fileType(file));
puts(fixedName);
}
I have zero issues with the string processing. I'm stuck, however, on fileType(). I want this program to not only run on my primary machine (Kubuntu 14.04), but also to be capable of being run on a Windows machine as well. So, it seems I need some library or set of libraries that would be common to both (or at the least compiled for both).
Any help/advice?
Upvotes: 0
Views: 197
Reputation: 1723
You could try looking at file
source code: https://github.com/file/file .
But as wikipedia states
file's position-sensitive tests are normally implemented by matching various locations within the file against a textual database of magic numbers (see the Usage section). This differs from other simpler methods such as file extensions and schemes like MIME.
In most implementations, the file command uses a database to drive the probing of the lead bytes. That database is implemented in a file called magic, whose location is usually in /etc/magic, /usr/share/file/magic or a similar location.
So it does not seem trivial.
Upvotes: 1
Reputation: 45654
There are more exceptions than rules for guessing the actual type of a file based on its contents.
This is exacerbated by the fact that a file can be valid and useful interpreted as two completely different file types.
For a good program trying to guess on insufficient data, try file
on Unixoids.
Upvotes: 2