Reputation: 6395
What is the easiest way to "interpret" formatting control characters in a string, to show the results as if they were printed. For simplicity, I will assume there are no newlines in the string.
So for example,
>>> sys.stdout.write('foo\br')
shows for
, therefore
interpret('foo\br')
should be 'for'
>>>sys.sdtout.write('foo\rbar')
shows bar
, therefore
interpret('foo\rbar')
should be 'bar'
I can write a regular expression substitution here, but, in the case of '\b'
replacement, it would have to be applied recursively until there are no more occurrences. It would be quite complex if done without recursion.
Is there an easier way?
Upvotes: 1
Views: 976
Reputation: 101959
Python's does not have any built-in or standard library module for doing this.
However if you only care for simple control characters like \r
, \b
and \n
you can write a simple function to handle this:
def interpret(text):
lines = []
current_line = []
for char in text:
if char == '\n':
lines.append(''.join(current_line))
current_line = []
elif char == '\r':
current_line.clear()
# del current_line[:] # in old python versions
elif char == '\b':
del current_line[-1:]
else:
current_line.append(char)
if current_line:
lines.append(current_line)
return '\n'.join(lines)
You can extend the function handling any control character you want. For example you might want to ignore some control characters that don't get actually displayed in a terminal (e.g. the bell \a
)
Upvotes: 1
Reputation: 33950
UPDATE: after 30 minutes of asking for clarifications and an example string, we find the question is actually quite different: "How to repeatedly apply formatting control characters (backspace) to a Python string?" In that case yes you apparently need to apply the regex/fn repeatedly until you stop getting matches. SOLUTION:
import re
def repeated_re_sub(pattern, sub, s, flags=re.U):
"""Match-and-replace repeatedly until we run out of matches..."""
patc = re.compile(pattern, flags)
sold = ''
while sold != s:
sold = s
print "patc=>%s< sold=>%s< s=>%s<" % (patc,sold,s)
s = patc.sub(sub, sold)
#print help(patc.sub)
return s
print repeated_re_sub('[^\b]\b', '', 'abc\b\x08de\b\bfg')
#print repeated_re_sub('.\b', '', 'abcd\b\x08e\b\bfg')
[multiple previous answers, asking for clarifications and pointing out that both re.sub(...)
or string.replace(...)
could be used to solve the problem, non-recursively.]
Upvotes: 0
Reputation: 60147
If efficiency doesn't matter, a simple stack would work fine:
string = "foo\rbar\rbash\rboo\b\bba\br"
res = []
for char in string:
if char == "\r":
res.clear()
elif char == "\b":
if res: del res[-1]
else:
res.append(char)
"".join(res)
#>>> 'bbr'
Otherwise, I think this is about as fast as you can hope for in complex cases:
string = "foo\rbar\rbash\rboo\b\bba\br"
try:
string = string[string.rindex("\r")+1:]
except ValueError:
pass
split_iter = iter(string.split("\b"))
res = list(next(split_iter, ''))
for part in split_iter:
if res: del res[-1]
res.extend(part)
"".join(res)
#>>> 'bbr'
Note that I haven't timed this.
Upvotes: 1