Fizi
Fizi

Reputation: 1851

python: capture docstring as an exact match using regex

Say I have the following function

def func(self, arg1=None, arg2=None):
    """
    Returns the product for the specified arg1 and arg2.
    Args:
        arg1: argument 1
        arg2: argument 2
    Returns:
        product of arg1 for arg2.

    DAL Owner: Team IT
    Data Owner: Team A
    """
    return arg1*arg2

I have need to match only whatever is inside the triple quotations. I have been able to do match it as a group using the following regex

def func.*?\):.*?\"\"\"(.*)\"\"\"

But it does not allow me to string replace easily in python. So I have a need to capture the docstring as a match instead of a group. I tried lookbehind (?<=def f2) but then ran into the variable length not allowed in lookbehind problem. Any ideas as to how to go about this will be deeply appreciated.

UPDATE:

I really would like to know if this can be done with an exact match i.e. not having to use multiple capturing groups

Upvotes: 2

Views: 553

Answers (2)

CertainPerformance
CertainPerformance

Reputation: 370679

If you want to match only the portion inside the """s, one option is to use the regex module, which has support for the \K token, which effectively forgets everything that was previously matched. Utilize this by matching (normally) the def func up until the """s, then use \K to reset the starting point of the match to the current position, then lazy-repeat any character until lookahead matches another """:

import regex
input  = '''
def func(self, arg1=None, arg2=None):
    """
    Returns the product for the specified arg1 and arg2.
    Args:
        arg1: argument 1
        arg2: argument 2
    Returns:
        product of arg1 for arg2.

    DAL Owner: Team IT
    Data Owner: Team A
    """
    return arg1*arg2
'''
replaced = regex.sub(r'(?s)def func\(.*?"""\K.*?(?=""")', 'replacement', input)
print(replaced)

Output:

def func(self, arg1=None, arg2=None):
    """replacement"""
    return arg1*arg2

\K, when available, can be a more flexible replacement for lookbehind.

Upvotes: 2

Sweeper
Sweeper

Reputation: 270980

If you just want to replace easily, you capture everything else into groups:

(def func.*?\):.*?\"\"\").*(\"\"\")

Then to replace, you just write $1xxx$2 as the replacement, with xxx being the actual replacement you want.

Try it here.

Upvotes: 0

Related Questions