MusiGenesis
MusiGenesis

Reputation: 75376

Is there an easy way to determine the type of a file without knowing the file's extension?

I have a table with a binary column which stores files of a number of different possible filetypes (PDF, BMP, JPEG, WAV, MP3, DOC, MPEG, AVI etc.), but no columns that store either the name or the type of the original file. Is there any easy way for me to process these rows and determine the type of each file stored in the binary column? Preferably it would be a utility that only reads the file headers, so that I don't have to fully extract each file to determine its type.

Clarification: I know that the approach here involves reading just the beginning of each file. I'm looking for a good resource (aka links) that can do this for me without too much fuss. Thanks.

Also, just C#/.NET on Windows, please. I'm not using Linux and can't use Cygwin (doesn't work on Windows CE, among other reasons).

Upvotes: 7

Views: 3775

Answers (7)

sundar venugopal
sundar venugopal

Reputation: 3160

Here are a few tools to find the format of a file:

  1. a website Online File Identifier: http://mark0.net/onlinetrid.aspx by Marco Pontello

  2. a software called File Analyzer by Vadim Tarasov.

The website has the advantage not to require any installation, and thus is less likely to provide any malware. However, you have to upload your file, which might not be what you want for privacy.


Here is an example with the save file of the game Pampas & Selene: The Maze of Demons Demo:

Results obtained with the website

The .sav file is identified as TIM (PlayStation graphics) .

Upvotes: 8

Scott Dorman
Scott Dorman

Reputation: 42526

You need to use some p/invoke interop code to call the SHGetFileInfo method from the Win32 API. This article may also help.

Upvotes: 1

Bob
Bob

Reputation: 99814

Someone else asked a similar question and posted the code used to do exactly this. You should be able to take what is posted here, and slightly modify it so that it pulls from your database.

https://stackoverflow.com/questions/58510

In addition to that, it looks like someone has written a library based off of magic numbers to do this, however, it looks like the site requires registration, and some form of alternate access in order to download this lirbary. The documentation is avaliable for free without registration, that may be helpful.

http://software.topcoder.com/catalog/c_component.jsp?comp=13249160&ver=2

Upvotes: 4

thelsdj
thelsdj

Reputation: 9154

Easiest way to do this would be through access to a *nix (or cygwin) system that has the 'file' command:

$ file visitors.*
visitors.html: HTML document text
visitors.png:  PNG image data, 5360 x 2819, 8-bit colormap, non-interlaced

You could write a C# application that piped the first X bytes of each binary column to the file command (using - as the file name)

Upvotes: 1

jjnguy
jjnguy

Reputation: 138952

A lot of filetypes have well defined headers that begin the file. You could check the first few bytes to check to see how the file begins.

Upvotes: 1

Fernando Miguélez
Fernando Miguélez

Reputation: 11326

The easiest way I know is to use file command that it is also available in Windows with Cygwin .

Upvotes: 3

Paul Fisher
Paul Fisher

Reputation: 9666

This is not a complete answer, but a place to start would be a "magic numbers" library. This examines the first few bytes of a file to determine a "magic number", which is compared against a known list of them. This is (at least part) of how the file command on Linux systems works.

Upvotes: 6

Related Questions