daniels
daniels

Reputation: 19223

How to check if a file contains plain text?

I have a folder full of files and I want to search some string inside them. The issue is that some files may be zip, exe, ogg, etc. Can I check somehow what kind of file is it so I only open and search through txt, PHP, etc. files. I can't rely on the file extension.

Upvotes: 7

Views: 10187

Answers (4)

serup
serup

Reputation: 3830

try something like this :

def is_binay_file(filepathname):
    textchars = bytearray([7,8,9,10,12,13,27]) + bytearray(range(0x20, 0x7f)) + bytearray(range(0x80, 0x100))
    is_binary_string = lambda bytes: bool(bytes.translate(None, textchars))

    if is_binary_string(open(filepathname, 'rb').read(1024)):
       return True
    else:
       return False

use the method like this :

is_binay_file('<your file path name>')

This will return True if file is of binary type and False if it is of text - it should be easy to convert this to reflect your needs, fx. make a function is_text_file - I leave that up to you

Upvotes: 2

Sinan &#220;n&#252;r
Sinan &#220;n&#252;r

Reputation: 118156

You can use the Python interface to libmagic to identify file formats.

>>> import magic
>>> f = magic.Magic(mime=True)
>>> f.from_file('testdata/test.txt')
'text/plain'

For more examples, see the repo.

Upvotes: 5

Mike Cialowicz
Mike Cialowicz

Reputation: 10030

Use Python's mimetypes library:

import mimetypes
if mimetypes.guess_type('full path to document here')[0] == 'text/plain':
    # file is plaintext

Upvotes: 10

jdizzle
jdizzle

Reputation: 4154

If you're on linux you can parse the output of the file command-line tool.

Upvotes: 0

Related Questions