Haoliang
Haoliang

Reputation: 1097

How can I convert special characters in a string back into escape sequences?

Suppose I have a string like 'a\tb'. If I print it, I will see a b. But I want to see a\tb instead. How can I convert my string so it will print like that?

Upvotes: 57

Views: 61377

Answers (3)

Karl Knechtel
Karl Knechtel

Reputation: 61635

Please beware that the problem is underspecified in general. The built-in repr function will give you the canonical representation of the string as a string literal, as shown in the other answers. However, that may not be what you want.

For every string except the empty string, there are multiple ways of specifying the string contents. Even something as simple as ' ' could be written as '\x20', or '\u0020', or '\U00000020'. All of these are the same string (and that's ignoring the choice of enclosing quotes).

Python's choices are not always what you might expect. Newlines will be represented as '\n', but backspace characters for example will be represented as a hex code, not as '\b'. On the other hand, even really fancy characters like emoji may very well be included literally, not escaped.


If you want to change that behaviour, you will have to write it yourself, after defining the specific rules you want to apply. One useful tool for this is the translate method of strings, which can simply apply a mapping to each Unicode code point in the input. The string classmethod str.maketrans can help with creating such a mapping, but that's still underpowered - you're stuck giving a specific set of code points to translate (and their translations), and then everything not specified is left alone.

If you want to convert large amounts of code points in a way that follows some kind of pattern, you might write a function for that. However, handling special cases, or multiple separate blocks of Unicode code points, could end up with a lot of tedious branching logic.

Here is my attempt to get the best of both worlds. We create a dict subclass that implements __missing__ to dispatch to a function handling an appropriate character range, and caches the result. We initialize it with an iterable or mapping of hard-coded values (per the base dict constructor), *args that give (function, range) pairs (the function will be used to compute the result for characters with numeric values falling in the range), and **kwargs (again per the base dict constructor). We will accept Unicode characters as the keys, although translate will pass the numeric code point values; so we also need to translate in the constructor.

class strtrans(dict):
    def __init__(self, iterable_or_mapping, *args, **kwargs):
        self._handlers = args
        temp = dict(iterable_or_mapping, **kwargs)
        super().__init__({ord(k): v for k, v in temp.items()})
    def __missing__(self, value):
        self[value] = value # if no handler, leave the character untouched
        for func, r in self._handlers:
            if value in r:
                self[value] = func(value)
                break
        return self[value] # not missing any more.

Let's test it:

>>> hardcoded = {'\n': '\\n', '\t': '\\t', '\b': '\\b'}
>>> # Using the `.format` method bound to a string is a quick way
>>> # to get a callable that accepts the input number and formats it in.
>>> # For uppercase, use :02X instead of :02x etc.
>>> backslash_x = ('\\x{:02x}'.format, range(256))
>>> backslash_u = ('\\u{:04x}'.format, range(256, 65536))
>>> backslash_U = ('\\U{:08x}'.format, range(65536, 0x110000))
>>> mapping = strtrans(hardcoded, backslash_x, backslash_u, backslash_U)
>>> test = '\n\t\b\x01\u4EBA\U0001F60A'
>>> print(test.translate(mapping)) # custom behaviour - note lowercase output
\n\t\b\x01\u4eba\U0001f60a
>>> print(repr(test)) # canonical representation, with enclosing quotes
'\n\t\x08\x01人😊'
>>> print(test) # your terminal's rendering may vary!

       人😊

Upvotes: -3

wyz23x2
wyz23x2

Reputation: 306

1:

(Python 2)

print ur'a\tb'

Note that in Python 3.x, u'' is equal to '', and the prefix ur is invalid.
Python 3:

print(r'a\tb')

2:

(Python 3)

print('a\\tb') 

3:

If you want to get the raw repr of an existing string, here is a small function: (Python 3.6+)

def raw(string: str, replace: bool = False) -> str:
    """Returns the raw representation of a string. If replace is true, replace a single backslash's repr \\ with \."""
    r = repr(string)[1:-1]  # Strip the quotes from representation
    if replace:
        r = r.replace('\\\\', '\\')
    return r

Examples:

>>> print(raw('1234'))
1234
>>> print('\t\n'); print('='*10); print(raw('\t\n'))
    

==========
\t\n
>>> print(raw('\r\\3'))
\r\\3
>>> print(raw('\r\\3', True))
\r\3

Note this won't work for \N{...} Unicode escapes, only r'\N{...}' can. But I guess JSON doesn't have this :)

>>> print(raw('\N{PLUS SIGN}'))
+

Upvotes: 5

John Zwinck
John Zwinck

Reputation: 249502

print(repr('a\tb'))

repr() gives you the "representation" of the string rather than the printing the string directly.

Upvotes: 96

Related Questions