Reputation: 2486
I want to strip all python docstrings out of a file using simple search and replace, and the following (extremely) simplistic regex does the job for one line doc strings:
""".*"""
How can I extend that to work with multi-liners?
Tried to include \s
in a number of places to no avail.
Upvotes: 2
Views: 2094
Reputation: 44128
Sometimes there are multiline strings that are not docstrings. For example, you may have a complicated SQL query that extends across multiple lines. The following attempts to look for multiline strings that appear before class definitions and after function definitions.
import re
input_str = """'''
This is a class level docstring
'''
class Article:
def print_it(self):
'''
method level docstring
'''
print('Article')
sql = '''
SELECT * FROM mytable
WHERE DATE(purchased) >= '2020-01-01'
'''
"""
doc_reg_1 = r'("""|\'\'\')([\s\S]*?)(\1\s*)(?=class)'
doc_reg_2 = r'(\s+def\s+.*:\s*)\n(\s*"""|\s*\'\'\')([\s\S]*?)(\2[^\n\S]*)'
input_str = re.sub(doc_reg_1, '', input_str)
input_str = re.sub(doc_reg_2, r'\1', input_str)
print(input_str)
Prints:
class Article:
def print_it(self):
print('Article')
sql = '''
SELECT * FROM mytable
WHERE DATE(purchased) >= '2020-01-01'
'''
Upvotes: 0
Reputation: 626853
As you cannot use an inline s
(DOTALL) modifier, the usual workaround to match any char is using a character class with opposite shorthand character classes:
"""[\s\S]*?"""
or
"""[\d\D]*?"""
or
"""[\w\W]*?"""
will match """
then any 0+ chars, as few as possible as *?
is a lazy quantfiier, and then trailing """
.
Upvotes: 5