mpen
mpen

Reputation: 283313

Determine file type from data?

I've got some data that is coming in through a byte stream. I want to determine its file type so I know how to parse it. At present, I'm only concerned about HTML or Images, everything else can be discarded.

What's an efficient method of differentiating between the two? And what if I want to expand this to include other file types?

Upvotes: 1

Views: 662

Answers (2)

steinar
steinar

Reputation: 9663

This stackoverflow article discusses the same problem and is tagged with Python (this has nothing to do with programming languages though). They mention this article on file type signatures (not really signatures, but a common starting magic number for known file types). For security reasons, I would recommend getting the stream from a trusted source only if you're going to make this control your application logic in a non-trivial way.

Also, since you're just checking if a file is html or binary (at the moment), you might want to check for the existence of 0 in the byte stream (the byte, not the character), or just any illegal html character (e.g. 0x1).

Upvotes: 1

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 799430

There is a wrapper for libmagic out there, but I don't know if it's actually alive/working.

Upvotes: 1

Related Questions