firebush
firebush

Reputation: 5880

How to Programmatically detect whether a file is a Python script

I'd like to create a git pre-commit hook for my project that runs autopep8 on files modified by the potential commit. I only want to run it on Python files, not the other C++ files, text files, etc. How can I programmatically detect whether a file is a Python file? Not all of the Python files in the repository have the .py extension, so I cannot rely upon that.

Upvotes: 7

Views: 1633

Answers (2)

sage
sage

Reputation: 5514

I am surprised not to see a solid answer to this. I'm leaning toward:

  1. If it ends in ".py", it's a Python file
  2. If it has a "#! /usr/bin/env python[3]" line, it's a Python file

I know that leaves out things like scripts that hard-code the interpreter, such as:

#! /some/virtual/env/bin/python3

I'm tempted to check for #! followed by the word python anywhere.

If you want to do the same, a first cut (with some debug print statements) can look like:

import os
import re


def is_readable_py_file(filename: str) -> bool:
    """Determine if filename is a python file and return bool."""
    if not os.path.isfile(filename):
        return False

    if os.path.splitext(filename)[1] == ".py":
        return True

    # Allow #!-specified files without ".py" extension                                                                                                                       
    try:
        with open(filename) as infile:
            first_line = infile.readline()
            if re.match(r"\s*#!\s*/usr/bin/env\s\s*python", first_line):
                return True
    except Exception as exc:
        print(f"Caught exception: {exc}")
        print(f"Assuming not a Python file: '{filename}'")

    return False

I expect that no approach is ideal for everybody and I think this is quite crude, but if you just want to copy/paste to get started, have at it!

Oh, the alternative check I'm considering would be (it matches everything the /usr/bin/env one matches, so you can substitute it):

            if re.match(r"\s*#!.*python", first_line):  # python anywhere in shebang
                return True

Upvotes: 0

Jan Matejka
Jan Matejka

Reputation: 1968

You can't.

At least not in such general case and with perfect accuracy. Your best bet is to make sure all your python files in the repo do have .py extension or are disntinguished from other files in some simple, finite amount ways.

Your next best bet is file command.

Upvotes: 1

Related Questions