Hinchy
Hinchy

Reputation: 683

Compare file extension to file header

I'm starting to design an application, that will, in part, run through a directory of files and compare their extensions to their file headers.

Does anyone have any advice as to the best way to approach this? I know I could simply have a lookup table that will contain the file's header signature. e.g., JPEG: \xFF\xD8\xFF\xE0

I was hoping there might be a simper way.

Thanks in advance for your help.

Upvotes: 1

Views: 5079

Answers (5)

Chetan Pardeshi
Chetan Pardeshi

Reputation: 39

You can know the file type of file reading the header using apache tika.
Following code need apache tika jar.

InputStream is = MainApp.class.getResourceAsStream("/NetFx20SP1_x64.txt");
BufferedInputStream bis = new BufferedInputStream(is);

AutoDetectParser parser = new AutoDetectParser();
Detector detector = parser.getDetector();
Metadata md = new Metadata();
md.add(Metadata.RESOURCE_NAME_KEY,MainApp.class.getResource("/NetFx20SP1_x64.txt").getPath());
MediaType mediaType = detector.detect(bis, md);

System.out.println("MIMe Type of File : " + mediaType.toString());

Upvotes: 0

Colin Hebert
Colin Hebert

Reputation: 93197

You can extract the mime type for each file and compare this to a map of mimetype/extension (Map<String, List<String>>, the first String is the mime type, the second is a list of valid extensions).


Resources :

On the same topic :

Upvotes: 0

Arne Burmeister
Arne Burmeister

Reputation: 20614

Because of the problem with the missing significant header for some file types (thanks @Michael) I would create a map of extension to a kind of type checker with a simple API like

public interface TypeCheck throws IOException {
  public boolean isValid(InputStream data);
}

Now you can code something like

File toBeTested = ...;
Map<String,TypeCheck> typeCheckByExtension = ...;
TypeCheck check = typeCheckByExtension.get(getExtension(toBeTested.getName()));
if (check != null) {
  InputStream in = new FileInputStream(toBeTested);
  if (check.isValid(in)) {
    // process valid file
  } else {
    // process invalid file
  }
  in.close();
} else {
  // process unknown file
}

The Header check for JPEG for example may look like

public class JpegTypeCheck implements TypeCheck {
  private static final byte[] HEADER = new byte[] {0xFF, 0xD8, 0xFF, 0xE0};

  public boolean isValid(InputStream data) throws IOException {
    byte[] header = new byte[4];
    return data.read(header) == 4 && Arrays.equals(header, HEADER);
  }
}

For other types with no significant header you can implement completly other type checks.

Upvotes: 0

Jack
Jack

Reputation: 133669

If you don't need to do dirty work on these values (and you don't have linux) you could simply use an external program, like TrID, that is able to do this thing for you.

Maybe you can just work on its output without caring to doing it by yourself.. in anycase if you have just around 20 kinds of files that you will have to manage having a simple lookup table (eg. HashMap<String,byte[]>) is not that bad. Of cours this will work only if desidered file format has a magic number, otherwise you are on your own (or with an external program).

Upvotes: 0

Michael Borgwardt
Michael Borgwardt

Reputation: 346536

I'm afraid it'll have to be more complicated than that. Not every file type has a header at all, and some (such as RAR) have their characteristic data structures at the end rather than at the beginning.

You may want to take a look at the Unix file command, which does the same job:

Upvotes: 2

Related Questions