Bartosz Radaczyński
Bartosz Radaczyński

Reputation: 18564

How to skip the docstring using regex

I'm trying to insert some import lines into a python source file, but i would ideally like to place them right after the initial docstring. Let's say I load the file into the lines variable like this:

lines = open('filename.py').readlines()

How to find the line number, where the docstring ends?

Upvotes: 2

Views: 1290

Answers (3)

The Unfun Cat
The Unfun Cat

Reputation: 31908

This is a function based on Brian's brilliant answer you can use to split a file into docstring and code:

def split_docstring_and_code(infile):

    import tokenize
    insert_index = None
    f = open(infile)
    for tok, text, (srow, scol), (erow,ecol), l in tokenize.generate_tokens(f.readline):
        if tok == tokenize.COMMENT:
            continue
        elif tok == tokenize.STRING:
            insert_index = erow, ecol
            break
        else:
            break # No docstring found

    lines = open(infile).readlines()
    if insert_index is not None:
        erow = insert_index[0]
        return "".join(lines[:erow]), "".join(lines[erow:])
    else:
        return "", "".join(lines)

It assumes that the line that ends the docstring does not contain additional code past the closing delimiter of the string.

Upvotes: 0

Brian
Brian

Reputation: 119211

Rather than using a regex, or relying on specific formatting you could use python's tokenize module.

import tokenize
f=open(filename)
insert_index = None
for tok, text, (srow, scol), (erow,ecol), l in tokenize.generate_tokens(f.readline):
    if tok == tokenize.COMMENT:
        continue
    elif tok == tokenize.STRING:
        insert_index = erow, ecol
        break
    else:
        break # No docstring found

This way you can even handle pathological cases like:

# Comment
# """Not the real docstring"""
' this is the module\'s \
docstring, containing:\
""" and having code on the same line following it:'; this_is_code=42

excactly as python would handle them.

Upvotes: 11

John Millikin
John Millikin

Reputation: 200756

If you're using the standard docstring format, you can do something like this:

count = 0
for line in lines:
    if line.startswith ('"""'):
        count += 1
        if count < 3:
            # Before or during end of the docstring
            continue
    # Line is after docstring

Might need some adaptation for files with no docstrings, but if your files are formatted consistently it should be easy enough.

Upvotes: 2

Related Questions