alpernick
alpernick

Reputation: 1

Ensure File Extension Matches File Type in C++

all. I am trying to write a C++ program that will iterate through a user-specified directory (e.g. /home/alpernick/Pictures). Primarily, this is to ensure that there are no duplicates (checked via md5sum).

But one feature I truly want to include is to ensure that the extension of a filename matches the file's type.

For example, if the file's name is "sunrise.png" I want to ensure that it actually is indeed a PNG and not a mislabeled JPEG (for example).

I am approaching this with four functions, as follows.

string file = sunset.jpg;
/* Setting file to be hard-coded for illustrative purposes only */
if(extension(file) != fileType(file)
{
    char fixedName [] = renameFile(file, basename(file), fileType(file));
    puts(fixedName);
}

I have zero issues with the string processing. I'm stuck, however, on fileType(). I want this program to not only run on my primary machine (Kubuntu 14.04), but also to be capable of being run on a Windows machine as well. So, it seems I need some library or set of libraries that would be common to both (or at the least compiled for both).

Any help/advice?

Upvotes: 0

Views: 197

Answers (2)

zoska
zoska

Reputation: 1723

You could try looking at file source code: https://github.com/file/file .

But as wikipedia states

file's position-sensitive tests are normally implemented by matching various locations within the file against a textual database of magic numbers (see the Usage section). This differs from other simpler methods such as file extensions and schemes like MIME.

In most implementations, the file command uses a database to drive the probing of the lead bytes. That database is implemented in a file called magic, whose location is usually in /etc/magic, /usr/share/file/magic or a similar location.

So it does not seem trivial.

Upvotes: 1

Deduplicator
Deduplicator

Reputation: 45654

There are more exceptions than rules for guessing the actual type of a file based on its contents.

This is exacerbated by the fact that a file can be valid and useful interpreted as two completely different file types.

For a good program trying to guess on insufficient data, try file on Unixoids.

Upvotes: 2

Related Questions