Phoenix
Phoenix

Reputation: 4636

Python - The Standard Library - ascii( ) Function

I have begun to look through the Python Standard Library: (http://docs.python.org/3/library/functions.html)

In an attempt to further familiarise myself with basic python. When it comes to the explanation on the ascii( ) function, I'm not finding it clear.

Is someone able to supply a concise explanation giving examples of useful situations in which one may use the ascii( ) function please?

Upvotes: 5

Views: 3040

Answers (2)

Martijn Pieters
Martijn Pieters

Reputation: 1121406

ascii() is a function that encodes the output of repr() to use escape sequences for any codepoint in the output produced by repr() that is not within the ASCII range.

So a Latin 1 codepoint like ë is represented by the Python escape sequence \xeb instead.

This was the standard representation in Python 2; Python 3 repr() leaves most Unicode codepoints as their actual value in the output, as long as it is a printable character:

>>> print(repr('ë'))
'ë'
>>> print(ascii('ë'))
'\xeb'

Both outputs are valid Python string literals, but the latter uses just ASCII characters, while the former requires a Unicode-compatible encoding.

For unicode codepoints between U+0100 and U+FFFF \uxxxx escape code sequences are used, for anything over that the \Uxxxxxxxx form is used. See the available escape code syntax for Python string literals.

Like repr(), ascii() is a very helpful debugging tool, especially when it comes to exact contents of a string. Unlike repr(), the ascii() output makes many Unicode gotchas much more visible.

Take de-normalised codepoints for example; The ë character can be represented in two ways, as the U+00EB codepoint, or as an ASCII e plus combining diaeresis ¨ (codepoint U+0308):

>>> import unicodedata
>>> one, two = 'ë', unicodedata.normalize('NFD', 'ë')
>>> print(one, two)
ë ë
>>> print(repr(one), repr(two))
'ë' 'ë'
>>> print(ascii(one), ascii(two))
'\xeb' 'e\u0308'

Only with ascii() is it clear that two consists of two distinct codepoints.

Upvotes: 8

unutbu
unutbu

Reputation: 879143

ascii() can be useful for finding out exactly what is in a string. If a string has whitespace or unprintable characters, or if the terminal is turning the string into mojibake because of a character-encoding mismatch, it is useful to look at the ascii representation of the string since it provides a visible and unambiguous representation for those otherwise unreadable characters which will print the same way on everyone's terminals.

There are frequent questions on Stackoverflow regarding incorrectly printed strings, and sometimes it is hard to tell what's going on because the question only shows the mojibake and not an unambiguous representation of the string. When the questioner shows the ascii representation (or the repr in Python 2) then the situation becomes much clearer.

Upvotes: 1

Related Questions