Mike
Mike

Reputation: 37

File formats in python

I want to have a script that gets a file name and checks if it's a file. A file ends with .txt, .exe etc'. There is any library or module in python that include ALL the file formats? If there isn't, how can I verify that the given input (like: hey.txt, what.exe etc') is a file? P.S I'm checking files of a website, not an operation system file (like: "https://www.magshimim.net/App_Themes/En/images/powered_by_priza_heb.gif" Thanks to all the helpers :)

Upvotes: 0

Views: 179

Answers (4)

Dalen
Dalen

Reputation: 4236

I suggest:

import os.path # Use any path (ntpath, posixpath, ...) module that uses "." as an extension separator instead to be sure (if you want)

filename, ext = os.path.splitext(inputname)
# If filename and ext are both full, then it is a filename like 'something.txt'
# If only ext is there, and filename is not, then filename is something like '.bashrc' or '.ds_store'
# If there is no ext, only filename, then a file doesn't have an extension
# So:
if filename and ext: print "File", filename, "with extension", ext
elif ext and not filename:
    filename = ext; ext = ""
    print "File", filename, "with no extension!"
else: print filename, "is not a file by 'must have an extension' rule!"

You can also achieve the check with something like:

c = inputname.count(".")
if c!=0 and not inputname.endswith(".") and not (inputname.startswith(".") and c==1):
    print inputname, "is a file because it has an extension!"
else: print inputname, "is not a file, no extension!"

If you really have to check for existing format, then, yes, use mimetypes.

Or Google around, I saw somewhere pretty extensive list (as library) of all formats for PHP. Take this and convert it to Python. Few find and replaces would do it.

Upvotes: 0

seartun
seartun

Reputation: 191

If the files are located on web server, you can use Content-Type header to get type of the file.

import urllib2

urls = ['https://www.magshimim.net/App_Themes/En/images/powered_by_priza_heb.gif',
        'https://www.magshimim.net/images/magshimim_logo.png']

for url in urls:
    response = urllib2.urlopen(url)
    print url
    print response.headers.getheader('Content-type')    # Content Type
    print response.headers.getheader('Content-Length')  # Size
    print

Output should be :

https://www.magshimim.net/App_Themes/En/images/powered_by_priza_heb.gif
image/gif
1325

https://www.magshimim.net/images/magshimim_logo.png
image/png
8314

Upvotes: 2

Delgan
Delgan

Reputation: 19717

There is no such library because there is an unlimited number of file formats. I can create my own .something, and you can too, the file will still be a proper file.

Instead, you have to use os.path.isfile().


As @zero323 pointed it out, and according to your edit, you should use the library mimetypes.

Then, use .guess_type() which returns None if the filetype can not be guessed.

See the full list of MIME types here.

Upvotes: 2

mohor chatt
mohor chatt

Reputation: 356

the best thing would be to use a regular expressions,since your script is checking whether the following object is a file or not.....if you want to check whether the particular file exists then it would be beneficial to use os.path.isfile(path)... if you are comfortable with regular expressions then try to create a regular expression,otherwise let me know i will create it for you. your feedback will be highly appreciated thank you.

Upvotes: 0

Related Questions