safetyduck
safetyduck

Reputation: 6854

Get python function source excluding the docstring?

You might want to have the docstring not affect the hash for example like in joblib memory.

Is there a good way of stripping the docstring? inspect.getsource and inspect.getdoc kind of fight each other: the docstring is "cleaned" in one.

Upvotes: 4

Views: 673

Answers (4)

Leandro Lima
Leandro Lima

Reputation: 5418

In case anyone is still looking for a solution for this, this is how I managed to build it:

from ast import Constant, Expr, FunctionDef, Module, parse
from inspect import getsource
from textwrap import dedent
from types import FunctionType
from typing import cast


def get_source_without_docstring(obj: FunctionType) -> str:
    # Get cleanly indented source code of the function
    obj_source = dedent(getsource(obj))

    # Parse the source code into an Abstract Syntax Tree.
    # The root of this tree is a Module node.
    module: Module = parse(obj_source)

    # The first child of a Module node is FunctionDef node that represents
    # the function definition. We cast module.body[0] to FunctionDef for type safety.
    function_def = cast(FunctionDef, module.body[0])

    # The first statement of a function could be a docstring, which in AST
    # is represented as an Expr node. To remove the docstring, we need to find
    # this Expr node.
    first_stmt = function_def.body[0]

    # Check if the first statement is a docstring (a constant str expression)
    if (
        isinstance(first_stmt, Expr)
        and isinstance(first_stmt.value, Constant)
        and isinstance(first_stmt.value.value, str)
    ):
        # Split the original source code by lines
        code_lines: list[str] = obj_source.splitlines()

        # Delete the lines corresponding to the docstring from the list.
        # Note: We are using 0-based list index, but the line numbers in the
        # parsed AST nodes are 1-based. So, we need to subtract 1 from the
        # 'lineno' property of the node.
        del code_lines[first_stmt.lineno - 1 : first_stmt.end_lineno]

        # Join the remaining lines back into a single string
        obj_source = "\n".join(code_lines)

    # Return the source code of function without docstrings
    return obj_source

Note: code by myself, comments by OpenAI's GPT

Upvotes: 2

fccoelho
fccoelho

Reputation: 6204

There is a simple solution

def fun(a,b):
    '''hahah'''
    return a+b
# we simply delete the docstring
fun.__doc__ = ''
print(help(fun))

this code yields:

Help on function fun in module __main__:

fun(a, b)

Upvotes: 0

Tryph
Tryph

Reputation: 6209

If you just want to hash the body of a function, regardless of the docstring, you can use the function.__code__ attribute.

It gives access to a code object which is not affected by the docstring.

unfortunately, using this, you will not be able to get a readable version of the source

def foo():
    """Prints 'foo'"""
    print('foo')


print(foo.__doc__)  # Prints 'foo'
print(foo.__code__.co_code)  # b't\x00d\x01\x83\x01\x01\x00d\x02S\x00'
foo.__doc__ += 'pouet'
print(foo.__doc__)  # Prints 'foo'pouet
print(foo.__code__.co_code)  # b't\x00d\x01\x83\x01\x01\x00d\x02S\x00'

Upvotes: 1

bb1950328
bb1950328

Reputation: 1599

One approach is to delete the docstring from the source using regex:

nodoc = re.sub(":\s'''.*?'''", "", source)
nodoc = re.sub(':\s""".*?"""', "", nodoc)

currently works for functions and classes only, maybe someone finds a pattern for modules too

Upvotes: 0

Related Questions