Reputation: 1097
Suppose I have a string like 'a\tb'
. If I print
it, I will see a b
. But I want to see a\tb
instead. How can I convert my string so it will print like that?
Upvotes: 57
Views: 61377
Reputation: 61635
Please beware that the problem is underspecified in general. The built-in repr
function will give you the canonical representation of the string as a string literal, as shown in the other answers. However, that may not be what you want.
For every string except the empty string, there are multiple ways of specifying the string contents. Even something as simple as ' '
could be written as '\x20'
, or '\u0020'
, or '\U00000020'
. All of these are the same string (and that's ignoring the choice of enclosing quotes).
Python's choices are not always what you might expect. Newlines will be represented as '\n'
, but backspace characters for example will be represented as a hex code, not as '\b'
. On the other hand, even really fancy characters like emoji may very well be included literally, not escaped.
If you want to change that behaviour, you will have to write it yourself, after defining the specific rules you want to apply. One useful tool for this is the translate
method of strings, which can simply apply a mapping to each Unicode code point in the input. The string classmethod str.maketrans
can help with creating such a mapping, but that's still underpowered - you're stuck giving a specific set of code points to translate (and their translations), and then everything not specified is left alone.
If you want to convert large amounts of code points in a way that follows some kind of pattern, you might write a function for that. However, handling special cases, or multiple separate blocks of Unicode code points, could end up with a lot of tedious branching logic.
Here is my attempt to get the best of both worlds. We create a dict subclass that implements __missing__
to dispatch to a function handling an appropriate character range, and caches the result. We initialize it with an iterable or mapping of hard-coded values (per the base dict
constructor), *args
that give (function, range)
pairs (the function
will be used to compute the result for characters with numeric values falling in the range
), and **kwargs
(again per the base dict
constructor). We will accept Unicode characters as the keys, although translate
will pass the numeric code point values; so we also need to translate in the constructor.
class strtrans(dict):
def __init__(self, iterable_or_mapping, *args, **kwargs):
self._handlers = args
temp = dict(iterable_or_mapping, **kwargs)
super().__init__({ord(k): v for k, v in temp.items()})
def __missing__(self, value):
self[value] = value # if no handler, leave the character untouched
for func, r in self._handlers:
if value in r:
self[value] = func(value)
break
return self[value] # not missing any more.
Let's test it:
>>> hardcoded = {'\n': '\\n', '\t': '\\t', '\b': '\\b'}
>>> # Using the `.format` method bound to a string is a quick way
>>> # to get a callable that accepts the input number and formats it in.
>>> # For uppercase, use :02X instead of :02x etc.
>>> backslash_x = ('\\x{:02x}'.format, range(256))
>>> backslash_u = ('\\u{:04x}'.format, range(256, 65536))
>>> backslash_U = ('\\U{:08x}'.format, range(65536, 0x110000))
>>> mapping = strtrans(hardcoded, backslash_x, backslash_u, backslash_U)
>>> test = '\n\t\b\x01\u4EBA\U0001F60A'
>>> print(test.translate(mapping)) # custom behaviour - note lowercase output
\n\t\b\x01\u4eba\U0001f60a
>>> print(repr(test)) # canonical representation, with enclosing quotes
'\n\t\x08\x01人😊'
>>> print(test) # your terminal's rendering may vary!
人😊
Upvotes: -3
Reputation: 306
(Python 2)
print ur'a\tb'
Note that in Python 3.x, u''
is equal to ''
, and the prefix ur
is invalid.
Python 3:
print(r'a\tb')
(Python 3)
print('a\\tb')
If you want to get the raw repr of an existing string, here is a small function: (Python 3.6+)
def raw(string: str, replace: bool = False) -> str:
"""Returns the raw representation of a string. If replace is true, replace a single backslash's repr \\ with \."""
r = repr(string)[1:-1] # Strip the quotes from representation
if replace:
r = r.replace('\\\\', '\\')
return r
Examples:
>>> print(raw('1234'))
1234
>>> print('\t\n'); print('='*10); print(raw('\t\n'))
==========
\t\n
>>> print(raw('\r\\3'))
\r\\3
>>> print(raw('\r\\3', True))
\r\3
Note this won't work for \N{...}
Unicode escapes, only r'\N{...}'
can. But I guess JSON doesn't have this :)
>>> print(raw('\N{PLUS SIGN}'))
+
Upvotes: 5
Reputation: 249502
print(repr('a\tb'))
repr()
gives you the "representation" of the string rather than the printing the string directly.
Upvotes: 96