R32415
R32415

Reputation: 73

Extracting Python function names, bodies, and doctypes from a .py file

I have been programming in Python for several years, however, I just learned that you can use commands like:

<function>.__doc__

to return the docstring of a Python function! However, this is still insufficient for the task I am looking to do. I need to be able to extract the names of each function defined in a particular .py file, and extract the function name docstring, and body. For example, consider I have the following .py file:

import numpy as np

def get_palindrome(string):
  """Returns the palindrome of the string argument"""
  return string[::-1]

def break_my_computer():
  """Destroys your RAM"""
  a = []
  while True:
    a.append(1)

This should be able to return the following information:

info = {1: {
            'name': 'get_palindrome',
            'docstring': 'Returns the palindrome of the string argument',
            'body': 'return string[::-1]'
            },
        2: {
            'name': 'break_my_computer',
            'docstring': 'Destroys your RAM',
            'body': 'a = []\nwhile True:\n  a.append(1)'
        }   }

What is the easiest way to get this information in Python (I preferably don't want to have to use the regular expressions library or do any text parsing and matching).

Note: When encountering multi-line docstrings or functions bodies, \n (newline) commands should be present in the corresponding output; tabs should be represented either by a command, or by spaces.

Upvotes: 3

Views: 2014

Answers (1)

Hack5
Hack5

Reputation: 3601

The correct answer is - DO NOT DO THIS!

I'm gonna do it anyway.

The first two parts of your question are fairly simple, so I'll get that over and down with first:

import inspect

import module_name as module
# This bit is for if you want to load the module from a file by path, mutually exclusive with previous line
# import importlib
#
# spec = importlib.util.spec_from_file_location("module_name", "/path/to/module_name.py")
# module = importlib.util.module_from_spec(spec)
# spec.loader.exec_module(module)

funcs = []
for name, value in vars(module).items():
    if name.startswith("_") or not callable(value):
        continue
    doc = inspect.getdoc(value)
    code = marshal.dumps(value.__code__)
    funcs.append({"name": name, "docstring": doc, "body": code})

Now we reach the hard bit - the body. It is not possible to read the source code directly because Python simply doesn't store it. You could of course read the file, using getsource but this won't work for modules that were modified on disk since python loaded them. If you want to take this approach, follow this code:

import inspect

import module_name as module
# This bit is for if you want to load the module from a file by path, mutually exclusive with previous line
# import importlib
#
# spec = importlib.util.spec_from_file_location("module_name", "/path/to/module_name.py")
# module = importlib.util.module_from_spec(spec)
# spec.loader.exec_module(module)

funcs = []
for name, value in vars(module).items():
    if name.startswith("_") or not callable(value):
        continue
    doc = inspect.getdoc(value)
    code = inspect.getsource(value).split(":", maxsplit=1)[1]
    funcs.append({"name": name, "docstring": doc, "body": code})
print(funcs)

The best solution to this problem is not to store the source code at all. You should instead use the marshal module to serialise the code. This will break between major versions of python and is sort of an ugly hack. If there is any way you can avoid storing the code of a function, do so, because it is a security risk.

Full code including marshalling:

import inspect
import marshal
import types

import module_name as module
# This bit is for if you want to load the module from a file by path, mutually exclusive with previous line
# import importlib
#
# spec = importlib.util.spec_from_file_location("module_name", "/path/to/module_name.py")
# module = importlib.util.module_from_spec(spec)
# spec.loader.exec_module(module)

funcs = []
for name, value in vars(module).items():
    if name.startswith("_") or not callable(value):
        continue
    doc = inspect.getdoc(value)
    code = marshal.dumps(value.__code__)
    funcs.append({"name": name, "docstring": doc, "body": code})

for value in funcs:
    name = value["name"]
    doc = value["docstring"]
    code = value["body"]
    # invoke all the functions
    func = types.FunctionType(marshal.loads(code), globals(), name)
    func.__doc__ = doc
    func()

Upvotes: 2

Related Questions