AnC
AnC

Reputation: 4201

determining whether a MIME type is binary or text-based

Is there a library which allows determining whether a given content type is binary or text-based?

Obviously text/* is always textual, but for things like application/json, image/svg+xml or even application/x-latex it's rather tricky without inspecting the actual data.

Upvotes: 8

Views: 3473

Answers (3)

W.P. McNeill
W.P. McNeill

Reputation: 17066

I don't know of a definitive list of binary and non-binary MIME types, but for the Common MIME types I think the following does pretty well.

def is_binary(mime_type, subtype):
    if mime_type == "text":
        return False
    if mime_type != "application":
        return True
    return subtype not in ["json", "ld+json", "x-httpd-php", "x-sh", "x-csh", "xhtml+xml", "xml"]

Upvotes: 5

synthesizerpatel
synthesizerpatel

Reputation: 28036

There's a wrapper for libmagic for python -- pymagic. Thats the easiest method to accomplish what you want. Keep in mind that magic is only as good as the fingerprint. You can have false-positives if something 'looks' like another file format, but most cases pymagic will give you what you need.

One thing to watch out for would be the 'simple solution' of checking to see if any of the characters are 'outside' the printable ASCII range, as you will likely encounter unicode which will look like binary (and in fact, be binary) even though it's just textual content.

Upvotes: 2

amphetamachine
amphetamachine

Reputation: 30623

Usually programs that determine MIME type will also tell you the character set. For instance, file(1) (and corresponding libmagic) will give the following output:

> file --mime-encoding /bin/ls
/bin/ls: binary
> file --mime-encoding /etc/passwd
/etc/passwd: us-ascii

Upvotes: 2

Related Questions