jfs
jfs

Reputation: 414905

How to tell whether a file is executable on Windows in Python?

I'm writing grepath utility that finds executables in %PATH% that match a pattern. I need to define whether given filename in the path is executable (emphasis is on command line scripts).

Based on "Tell if a file is executable" I've got:

import os
from pywintypes import error
from win32api   import FindExecutable, GetLongPathName

def is_executable_win(path):
    try:
        _, executable = FindExecutable(path)
        ext = lambda p: os.path.splitext(p)[1].lower()
        if (ext(path) == ext(executable) # reject *.cmd~, *.bat~ cases
            and samefile(GetLongPathName(executable), path)):
            return True
        # path is a document with assoc. check whether it has extension
        # from %PATHEXT% 
        pathexts = os.environ.get('PATHEXT', '').split(os.pathsep)
        return any(ext(path) == e.lower() for e in pathexts)
    except error:
        return None # not an exe or a document with assoc.

Where samefile is:

try: samefile = os.path.samefile
except AttributeError:    
    def samefile(path1, path2):
        rp = lambda p: os.path.realpath(os.path.normcase(p))
        return rp(path1) == rp(path2)

How is_executable_win could be improved in the given context? What functions from Win32 API could help?

P.S.

Examples

What is "executable" in this question

def is_executable_win_destructive(path):
    #NOTE: it assumes `path` <-> `barename` for the sake of example
    barename = os.path.splitext(os.path.basename(path))[0]
    p = Popen(barename, stdout=PIPE, stderr=PIPE, shell=True)
    stdout, stderr = p.communicate()
    return p.poll() != 1 or stdout != '' or stderr != error_message(barename)

Where error_message() depends on language. English version is:

def error_message(barename):
    return "'%(barename)s' is not recognized as an internal" \
           " or external\r\ncommand, operable program or batch file.\r\n" \
           %  dict(barename=barename)

If is_executable_win_destructive() returns when it defines whether the path points to an executable for the purpose of this question.

Example:

>>> path = r"c:\docs\somefile.doc"
>>> barename = "somefile"

After that it executes %COMSPEC% (cmd.exe by default):

c:\cwd> cmd.exe /c somefile

If output looks like this:

'somefile' is not recognized as an internal or external
command, operable program or batch file.

Then the path is not an executable else it is (lets assume there is one-to-one correspondence between path and barename for the sake of example).

Another example:

>>> path = r'c:\bin\grepath.py'
>>> barename = 'grepath'

If .PY in %PATHEXT% and c:\bin is in %PATH% then:

c:\docs> grepath
Usage:
  grepath.py [options] PATTERN
  grepath.py [options] -e PATTERN

grepath.py: error: incorrect number of arguments

The above output is not equal to error_message(barename) therefore 'c:\bin\grepath.py' is an "executable".

So the question is how to find out whether the path will produce the error without actually running it? What Win32 API function and what conditions used to trigger the 'is not recognized as an internal..' error?

Upvotes: 7

Views: 7301

Answers (6)

jfs
jfs

Reputation: 414905

Here's the grepath.py that I've linked in my question:

#!/usr/bin/env python
"""Find executables in %PATH% that match PATTERN.

"""
#XXX: remove --use-pathext option

import fnmatch, itertools, os, re, sys, warnings
from optparse import OptionParser
from stat import S_IMODE, S_ISREG, ST_MODE
from subprocess import PIPE, Popen


def warn_import(*args):
    """pass '-Wd' option to python interpreter to see these warnings."""
    warnings.warn("%r" % (args,), ImportWarning, stacklevel=2)


class samefile_win:
    """
http://timgolden.me.uk/python/win32_how_do_i/see_if_two_files_are_the_same_file.html
"""
    @staticmethod
    def get_read_handle (filename):
        return win32file.CreateFile (
            filename,
            win32file.GENERIC_READ,
            win32file.FILE_SHARE_READ,
            None,
            win32file.OPEN_EXISTING,
            0,
            None
            )

    @staticmethod
    def get_unique_id (hFile):
        (attributes,
         created_at, accessed_at, written_at,
         volume,
         file_hi, file_lo,
         n_links,
         index_hi, index_lo
         ) = win32file.GetFileInformationByHandle (hFile)
        return volume, index_hi, index_lo

    @staticmethod
    def samefile_win(filename1, filename2):
        """Whether filename1 and filename2 represent the same file.

It works for subst, ntfs hardlinks, junction points.
It works unreliably for network drives.

Based on GetFileInformationByHandle() Win32 API call.
http://timgolden.me.uk/python/win32_how_do_i/see_if_two_files_are_the_same_file.html
"""
        if samefile_generic(filename1, filename2): return True
        try:
            hFile1 = samefile_win.get_read_handle (filename1)
            hFile2 = samefile_win.get_read_handle (filename2)
            are_equal = (samefile_win.get_unique_id (hFile1)
                         == samefile_win.get_unique_id (hFile2))
            hFile2.Close ()
            hFile1.Close ()
            return are_equal
        except win32file.error:
            return None


def canonical_path(path):
    """NOTE: it might return wrong path for paths with symbolic links."""
    return os.path.realpath(os.path.normcase(path))


def samefile_generic(path1, path2):
    return canonical_path(path1) == canonical_path(path2)


class is_executable_destructive:
    @staticmethod
    def error_message(barename):
        r"""
"'%(barename)s' is not recognized as an internal or external\r\n
command, operable program or batch file.\r\n"

in Russian:
"""
        return '"%(barename)s" \xad\xa5 \xef\xa2\xab\xef\xa5\xe2\xe1\xef \xa2\xad\xe3\xe2\xe0\xa5\xad\xad\xa5\xa9 \xa8\xab\xa8 \xa2\xad\xa5\xe8\xad\xa5\xa9\r\n\xaa\xae\xac\xa0\xad\xa4\xae\xa9, \xa8\xe1\xaf\xae\xab\xad\xef\xa5\xac\xae\xa9 \xaf\xe0\xae\xa3\xe0\xa0\xac\xac\xae\xa9 \xa8\xab\xa8 \xaf\xa0\xaa\xa5\xe2\xad\xeb\xac \xe4\xa0\xa9\xab\xae\xac.\r\n' % dict(barename=barename)

    @staticmethod
    def is_executable_win_destructive(path):
        # assume path <-> barename that is false in general
        barename = os.path.splitext(os.path.basename(path))[0]
        p = Popen(barename, stdout=PIPE, stderr=PIPE, shell=True)
        stdout, stderr = p.communicate()
        return p.poll() != 1 or stdout != '' or stderr != error_message(barename)


def is_executable_win(path):
    """Based on:
http://timgolden.me.uk/python/win32_how_do_i/tell-if-a-file-is-executable.html

Known bugs: treat some "*~" files as executable, e.g. some "*.bat~" files
"""
    try:
        _, executable = FindExecutable(path)
        return bool(samefile(GetLongPathName(executable), path))
    except error:
        return None # not an exe or a document with assoc.


def is_executable_posix(path):
    """Whether the file is executable.

Based on which.py from stdlib
"""
    #XXX it ignores effective uid, guid?
    try: st = os.stat(path)
    except os.error:
        return None

    isregfile = S_ISREG(st[ST_MODE])
    isexemode = (S_IMODE(st[ST_MODE]) & 0111)
    return bool(isregfile and isexemode)

try:
    #XXX replace with ctypes?
    from win32api import FindExecutable, GetLongPathName, error
    is_executable = is_executable_win
except ImportError, e:
    warn_import("is_executable: fall back on posix variant", e)
    is_executable = is_executable_posix

try: samefile = os.path.samefile
except AttributeError, e:
    warn_import("samefile: fallback to samefile_win", e)
    try:
        import win32file
        samefile = samefile_win.samefile_win
    except ImportError, e:
        warn_import("samefile: fallback to generic", e)
        samefile = samefile_generic

def main():
    parser = OptionParser(usage="""
%prog [options] PATTERN
%prog [options] -e PATTERN""", description=__doc__)
    opt = parser.add_option
    opt("-e", "--regex", metavar="PATTERN",
        help="use PATTERN as a regular expression")
    opt("--ignore-case", action="store_true", default=True,
        help="""[default] ignore case when --regex is present; for \
non-regex PATTERN both FILENAME and PATTERN are first \
case-normalized if the operating system requires it otherwise \
unchanged.""")
    opt("--no-ignore-case", dest="ignore_case", action="store_false")
    opt("--use-pathext", action="store_true", default=True,
        help="[default] whether to use %PATHEXT% environment variable")
    opt("--no-use-pathext", dest="use_pathext", action="store_false")
    opt("--show-non-executable", action="store_true", default=False,
        help="show non executable files")

    (options, args) = parser.parse_args()

    if len(args) != 1 and not options.regex:
       parser.error("incorrect number of arguments")
    if not options.regex:
       pattern = args[0]
    del args

    if options.regex:
       filepred = re.compile(options.regex, options.ignore_case and re.I).search
    else:
       fnmatch_ = fnmatch.fnmatch if options.ignore_case else fnmatch.fnmatchcase
       for file_pattern_symbol in "*?":
           if file_pattern_symbol in pattern:
               break
       else: # match in any place if no explicit file pattern symbols supplied
           pattern = "*" + pattern + "*"
       filepred = lambda fn: fnmatch_(fn, pattern)

    if not options.regex and options.ignore_case:
       filter_files = lambda files: fnmatch.filter(files, pattern)
    else:
       filter_files = lambda files: itertools.ifilter(filepred, files)

    if options.use_pathext:
       pathexts = frozenset(map(str.upper,
            os.environ.get('PATHEXT', '').split(os.pathsep)))

    seen = set()
    for dirpath in os.environ.get('PATH', '').split(os.pathsep):
        if os.path.isdir(dirpath): # assume no expansion needed
           # visit "each" directory only once
           # it is unaware of subst drives, junction points, symlinks, etc
           rp = canonical_path(dirpath)
           if rp in seen: continue
           seen.add(rp); del rp

           for filename in filter_files(os.listdir(dirpath)):
               path = os.path.join(dirpath, filename)
               isexe = is_executable(path)

               if isexe == False and is_executable == is_executable_win:
                  # path is a document with associated program
                  # check whether it is a script (.pl, .rb, .py, etc)
                  if not isexe and options.use_pathext:
                     ext = os.path.splitext(path)[1]
                     isexe = ext.upper() in pathexts

               if isexe:
                  print path
               elif options.show_non_executable:
                  print "non-executable:", path


if __name__=="__main__":
   main()

Upvotes: 1

Jiri
Jiri

Reputation: 16625

I think, that this should be sufficient:

  1. check file extension in PATHEXT - whether file is directly executable
  2. using cmd.exe command "assoc .ext" you can see whether file is associated with some executable (some executable will be launched when you launch this file). You can parse capture output of assoc without arguments and collect all extensions that are associated and check tested file extension.
  3. other file extensions will trigger error "command is not recognized ..." therefore you can assume that such files are NOT executable.

I don't really understand how you can tell the difference between somefile.py and somefile.txt because association can be really the same. You can configure system to run .txt files the same way as .py files.

Upvotes: 3

Jesse Weigert
Jesse Weigert

Reputation: 4854

Your question can't be answered. Windows can't tell the difference between a file which is associated with a scripting language vs. some other arbitrary program. As Windows is concerned, a .PY file is simply a document which is opened by python.exe.

Upvotes: -1

Unknown
Unknown

Reputation: 46831

Parse the PE format.

http://code.google.com/p/pefile/

This is probably the best solution you will get other than using python to actually try to run the program.

Edit: I see you also want files that have associations. This will require mucking in the registry which I don't have the information for.

Edit2: I also see that you differentiate between .doc and .py. This is a rather arbitrary differentiation which must be specified with manual rules, because to windows, they are both file extensions that a program reads.

Upvotes: 0

Geo
Geo

Reputation: 96987

shoosh beat me to it :)

If I remember correctly, you should try to read the first 2 characters in the file. If you get back "MZ", you have an exe.


hnd = open(file,"rb")
if hnd.read(2) == "MZ":
  print "exe"

Upvotes: 3

shoosh
shoosh

Reputation: 79021

A windows PE always starts with the characters "MZ". This includes however also any kind of DLLs which are not necessarily executables.
To check for this however you'll have to open the file and read the header so that's probably not what you're looking for.

Upvotes: 2

Related Questions