Reputation: 6854
You might want to have the docstring not affect the hash for example like in joblib memory.
Is there a good way of stripping the docstring? inspect.getsource and inspect.getdoc kind of fight each other: the docstring is "cleaned" in one.
Upvotes: 4
Views: 673
Reputation: 5418
In case anyone is still looking for a solution for this, this is how I managed to build it:
from ast import Constant, Expr, FunctionDef, Module, parse
from inspect import getsource
from textwrap import dedent
from types import FunctionType
from typing import cast
def get_source_without_docstring(obj: FunctionType) -> str:
# Get cleanly indented source code of the function
obj_source = dedent(getsource(obj))
# Parse the source code into an Abstract Syntax Tree.
# The root of this tree is a Module node.
module: Module = parse(obj_source)
# The first child of a Module node is FunctionDef node that represents
# the function definition. We cast module.body[0] to FunctionDef for type safety.
function_def = cast(FunctionDef, module.body[0])
# The first statement of a function could be a docstring, which in AST
# is represented as an Expr node. To remove the docstring, we need to find
# this Expr node.
first_stmt = function_def.body[0]
# Check if the first statement is a docstring (a constant str expression)
if (
isinstance(first_stmt, Expr)
and isinstance(first_stmt.value, Constant)
and isinstance(first_stmt.value.value, str)
):
# Split the original source code by lines
code_lines: list[str] = obj_source.splitlines()
# Delete the lines corresponding to the docstring from the list.
# Note: We are using 0-based list index, but the line numbers in the
# parsed AST nodes are 1-based. So, we need to subtract 1 from the
# 'lineno' property of the node.
del code_lines[first_stmt.lineno - 1 : first_stmt.end_lineno]
# Join the remaining lines back into a single string
obj_source = "\n".join(code_lines)
# Return the source code of function without docstrings
return obj_source
Note: code by myself, comments by OpenAI's GPT
Upvotes: 2
Reputation: 6204
There is a simple solution
def fun(a,b):
'''hahah'''
return a+b
# we simply delete the docstring
fun.__doc__ = ''
print(help(fun))
this code yields:
Help on function fun in module __main__:
fun(a, b)
Upvotes: 0
Reputation: 6209
If you just want to hash the body of a function, regardless of the docstring, you can use the function.__code__
attribute.
It gives access to a code
object which is not affected by the docstring.
unfortunately, using this, you will not be able to get a readable version of the source
def foo():
"""Prints 'foo'"""
print('foo')
print(foo.__doc__) # Prints 'foo'
print(foo.__code__.co_code) # b't\x00d\x01\x83\x01\x01\x00d\x02S\x00'
foo.__doc__ += 'pouet'
print(foo.__doc__) # Prints 'foo'pouet
print(foo.__code__.co_code) # b't\x00d\x01\x83\x01\x01\x00d\x02S\x00'
Upvotes: 1
Reputation: 1599
One approach is to delete the docstring from the source using regex:
nodoc = re.sub(":\s'''.*?'''", "", source)
nodoc = re.sub(':\s""".*?"""', "", nodoc)
currently works for functions and classes only, maybe someone finds a pattern for modules too
Upvotes: 0