all. I am trying to write a C++ program that will iterate through a user-specified directory (e.g. /home/alpernick/Pictures). Primarily, this is to ensure that there are no duplicates (checked via md5sum). But one feature I truly want to include is to ensure that the extension of a filename matches the file's type. For example, if the file's name is "sunrise.png" I want to ensure that it actually is indeed a PNG and not a mislabeled JPEG (for example). I am approaching this with four functions, as follows. string extension(string fileName) // returns the extension of fileName (including .tar.gz handling, so it isn't blindly just returning the last 3 characters) string fileType(string fileName) // This one is the key -- it returns the actual file type, so if the file named fileName is a PNG, fileType() will return PNG, regardless of the return value of extension() string basename(string fileName) // Rerturns the basename of the file, I.e. everything before the extension (so, for sunset.jpg, it would return sunset; for fluffytarball,tar.gz, it would return fluffytarball) string renameFile(string incorrectFileName, string fileNameBeforeExtension, string actualFileType) // Returns a string whose value is the basename concatenated with the correct file extension. string file = sunset.jpg; /* Setting file to be hard-coded for illustrative purposes only */ if(extension(file) != fileType(file) { char fixedName [] = renameFile(file, basename(file), fileType(file)); puts(fixedName); } I have zero issues with the string processing. I'm stuck, however, on fileType(). I want this program to not only run on my primary machine (Kubuntu 14.04), but also to be capable of being run on a Windows machine as well. So, it seems I need some library or set of libraries that would be common to both (or at the least compiled for both). Any help/advice?

Reputation: 1

Ensure File Extension Matches File Type in C++

all. I am trying to write a C++ program that will iterate through a user-specified directory (e.g. /home/alpernick/Pictures). Primarily, this is to ensure that there are no duplicates (checked via md5sum).

But one feature I truly want to include is to ensure that the extension of a filename matches the file's type.

For example, if the file's name is "sunrise.png" I want to ensure that it actually is indeed a PNG and not a mislabeled JPEG (for example).

I am approaching this with four functions, as follows.

string extension(string fileName) // returns the extension of fileName (including .tar.gz handling, so it isn't blindly just returning the last 3 characters)
string fileType(string fileName) // This one is the key -- it returns the actual file type, so if the file named fileName is a PNG, fileType() will return PNG, regardless of the return value of extension()
string basename(string fileName) // Rerturns the basename of the file, I.e. everything before the extension (so, for sunset.jpg, it would return sunset; for fluffytarball,tar.gz, it would return fluffytarball)
string renameFile(string incorrectFileName, string fileNameBeforeExtension, string actualFileType) // Returns a string whose value is the basename concatenated with the correct file extension.

string file = sunset.jpg;
/* Setting file to be hard-coded for illustrative purposes only */
if(extension(file) != fileType(file)
{
    char fixedName [] = renameFile(file, basename(file), fileType(file));
    puts(fixedName);
}

I have zero issues with the string processing. I'm stuck, however, on fileType(). I want this program to not only run on my primary machine (Kubuntu 14.04), but also to be capable of being run on a Windows machine as well. So, it seems I need some library or set of libraries that would be common to both (or at the least compiled for both).

Any help/advice?

Upvotes: 0

Answers (2)

zoska

Reputation: 1723

You could try looking at file source code: https://github.com/file/file .

But as wikipedia states

file's position-sensitive tests are normally implemented by matching various locations within the file against a textual database of magic numbers (see the Usage section). This differs from other simpler methods such as file extensions and schemes like MIME.

In most implementations, the file command uses a database to drive the probing of the lead bytes. That database is implemented in a file called magic, whose location is usually in /etc/magic, /usr/share/file/magic or a similar location.

So it does not seem trivial.

Upvotes: 1

Deduplicator

Reputation: 45654

There are more exceptions than rules for guessing the actual type of a file based on its contents.

This is exacerbated by the fact that a file can be valid and useful interpreted as two completely different file types.

For a good program trying to guess on insufficient data, try file on Unixoids.

Upvotes: 2

Ensure File Extension Matches File Type in C++

Answers (2)

Related Questions