Reputation:
I have a python editor where the user is entering a script or code, which is then put into a main method behind the scenes, while also having every line indented. The problem is that if a user has a multi line string, the indentation made to the whole script affects the string, by inserting a tab in every space. A problem script would be something so simple as:
"""foo
bar
foo2"""
So when in the main method it would look like:
def main():
"""foo
bar
foo2"""
and the string would now have an extra tab at the beginning of every line.
Upvotes: 119
Views: 59486
Reputation: 1684
This does the trick, if I understand the question correctly. lstrip() removes leading whitespace, so it will remove tabs as well as spaces.
from os import linesep
def dedent(message):
return linesep.join(line.lstrip() for line in message.splitlines())
Example:
name='host'
config_file='/Users/nmellor/code/cold_fusion/end-to-end/config/stage.toml'
message = f"""Missing env var or configuration entry for 'host'.
Please add '{name}' entry to file
{config_file}
or export environment variable 'mqtt_{name}' before
running the program.
"""
>>> print(message)
Missing env var or configuration entry for 'host'.
Please add 'host' entry to
'/Users/nmellor/code/cold_fusion/end-to-end/config/stage.toml'
or export environment variable 'mqtt_host' before
running the program.
>>> print(dedent(message))
Missing env var or configuration entry for 'host'.
Please add 'host' entry to file
'/Users/nmellor/code/cold_fusion/end-to-end/config/stage.toml'
or export environment variable 'mqtt_host' before
running the program.
The above solution will remove ALL indentation. If you want to remove indentation that is common to the whole multiline string, use textwrap.dedent(). But take care that the first and last lines in the multi-line string are also indented otherwise .dedent()
will do nothing.
Upvotes: 2
Reputation: 4453
I had a similar issue: I wanted my triple quoted string to be indented, but I didn't want the string to have all those spaces at the beginning of each line. I used re
to correct my issue:
print(re.sub('\n *','\n', f"""Content-Type: multipart/mixed; boundary="===============9004758485092194316=="
` MIME-Version: 1.0
Subject: Get the reader's attention here!
To: [email protected]
--===============9004758485092194316==
Content-Type: text/html; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Very important message goes here - you can even use <b>HTML</b>.
--===============9004758485092194316==--
"""))
Above, I was able to keep my code indented, but the string was left trimmed essentially. All spaces at the beginning of each line were deleted. This was important since any spaces in front of the SMTP or MIME specific lines would break the email message.
The tradeoff I made was that I left the Content-Type
on the first line because the regex
I was using didn't remove the initial \n
(which broke email). If it bothered me enough, I guess I could have added an lstrip like this:
print(re.sub('\n *','\n', f"""
Content-Type: ...
""").lstrip()
After reading this 10 year old page, I decided to stick with re.sub
since I didn't truly understand all the nuances of textwrap
and inspect
.
Upvotes: 0
Reputation: 1587
From what I see, a better answer here might be inspect.cleandoc
, which does much of what textwrap.dedent
does but also fixes the problems that textwrap.dedent
has with the leading line.
The below example shows the differences:
>>> import textwrap
>>> import inspect
>>> x = """foo bar
baz
foobar
foobaz
"""
>>> inspect.cleandoc(x)
'foo bar\nbaz\nfoobar\nfoobaz'
>>> textwrap.dedent(x)
'foo bar\n baz\n foobar\n foobaz\n'
>>> y = """
... foo
... bar
... """
>>> inspect.cleandoc(y)
'foo\nbar'
>>> textwrap.dedent(y)
'\nfoo\nbar\n'
>>> z = """\tfoo
bar\tbaz
"""
>>> inspect.cleandoc(z)
'foo\nbar baz'
>>> textwrap.dedent(z)
'\tfoo\nbar\tbaz\n'
Note that inspect.cleandoc
also expands internal tabs to spaces.
This may be inappropriate for one's use case, but works fine for me.
Upvotes: 84
Reputation: 8369
I wanted to preserve exactly what is between the triple-quote lines, removing common leading indent only. I found that texwrap.dedent
and inspect.cleandoc
didn't do it quite right, so I wrote this one. It uses os.path.commonprefix
.
import re
from os.path import commonprefix
def ql(s, eol=True):
lines = s.splitlines()
l0 = None
if lines:
l0 = lines.pop(0) or None
common = commonprefix(lines)
indent = re.match(r'\s*', common)[0]
n = len(indent)
lines2 = [l[n:] for l in lines]
if not eol and lines2 and not lines2[-1]:
lines2.pop()
if l0 is not None:
lines2.insert(0, l0)
s2 = "\n".join(lines2)
return s2
This can quote any string with any indent. I wanted it to include the trailing newline by default, but with an option to remove it so that it can quote any string neatly.
Example:
print(ql("""
Hello
|\---/|
| o_o |
\_^_/
"""))
print(ql("""
World
|\---/|
| o_o |
\_^_/
"""))
The second string has 4 spaces of common indentation because the final """
is indented less than the quoted text:
Hello
|\---/|
| o_o |
\_^_/
World
|\---/|
| o_o |
\_^_/
I thought this was going to be simpler, otherwise I wouldn't have bothered with it!
Upvotes: 2
Reputation: 43039
Showing the difference between textwrap.dedent
and inspect.cleandoc
with a little more clarity:
import textwrap
import inspect
string1="""String
with
no indentation
"""
string2="""String
with
indentation
"""
print('string1 plain=' + repr(string1))
print('string1 inspect.cleandoc=' + repr(inspect.cleandoc(string1)))
print('string1 texwrap.dedent=' + repr(textwrap.dedent(string1)))
print('string2 plain=' + repr(string2))
print('string2 inspect.cleandoc=' + repr(inspect.cleandoc(string2)))
print('string2 texwrap.dedent=' + repr(textwrap.dedent(string2)))
Output
string1 plain='String\nwith\nno indentation\n '
string1 inspect.cleandoc='String\nwith\nno indentation\n '
string1 texwrap.dedent='String\nwith\nno indentation\n'
string2 plain='String\n with\n indentation\n '
string2 inspect.cleandoc='String\nwith\nindentation'
string2 texwrap.dedent='String\n with\n indentation\n'
string1="""
String
with
no indentation
"""
string2="""
String
with
indentation
"""
print('string1 plain=' + repr(string1))
print('string1 inspect.cleandoc=' + repr(inspect.cleandoc(string1)))
print('string1 texwrap.dedent=' + repr(textwrap.dedent(string1)))
print('string2 plain=' + repr(string2))
print('string2 inspect.cleandoc=' + repr(inspect.cleandoc(string2)))
print('string2 texwrap.dedent=' + repr(textwrap.dedent(string2)))
Output
string1 plain='\nString\nwith\nno indentation\n '
string1 inspect.cleandoc='String\nwith\nno indentation\n '
string1 texwrap.dedent='\nString\nwith\nno indentation\n'
string2 plain='\n String\n with\n indentation\n '
string2 inspect.cleandoc='String\nwith\nindentation'
string2 texwrap.dedent='\nString\nwith\nindentation\n'
Upvotes: 2
Reputation: 156268
What follows the first line of a multiline string is part of the string, and not treated as indentation by the parser. You may freely write:
def main():
"""foo
bar
foo2"""
pass
and it will do the right thing.
On the other hand, that's not readable, and Python knows it. So if a docstring contains whitespace in it's second line, that amount of whitespace is stripped off when you use help()
to view the docstring. Thus, help(main)
and the below help(main2)
produce the same help info.
def main2():
"""foo
bar
foo2"""
pass
Upvotes: 22
Reputation: 3102
So if I get it correctly, you take whatever the user inputs, indent it properly and add it to the rest of your program (and then run that whole program).
So after you put the user input into your program, you could run a regex, that basically takes that forced indentation back. Something like: Within three quotes, replace all "new line markers" followed by four spaces (or a tab) with only a "new line marker".
Upvotes: -15
Reputation: 5191
textwrap.dedent from the standard library is there to automatically undo the wacky indentation.
Upvotes: 175
Reputation: 4500
The only way i see - is to strip first n tabs for each line starting with second, where n is known identation of main method.
If that identation is not known beforehand - you can add trailing newline before inserting it and strip number of tabs from the last line...
The third solution is to parse data and find beginning of multiline quote and do not add your identation to every line after until it will be closed.
Think there is a better solution..
Upvotes: 1