Yohan Chevalier
Yohan Chevalier

Reputation: 73

Trying to print ASCII characters 128 to 160, why does it stop at 157?

I'm new to Python and I'm learning coding/encoding, unicode, ascii and so on. I would like to print ASCII characters according to their codes and using chr() function.

def table_ascii():
    "procédure imprimant une table des caractères ascii avec leur valeurs"
    i = 127
    while i < 258:
        print(f"{i} -> {chr(i)}")
        i += 1

table_ascii()

Unfortunately, the result is wrong. It stops at the code 157 :

127 ->  
128 ->  
129 ->  
130 ->  
131 ->  
132 ->  

133 ->  

134 ->  
135 ->  
136 ->  
137 ->      
138 ->  
139 ->  
140 ->  
142 ->  
143 ->  
144 ->  
146 ->  
147 ->  
148 ->  
149 ->  
150 ->  
151 ->  
152 ->  
154 ->  
        155 ->  

157 ->

I understand these codes return blank but why do they stop the process?

Setup:

When I run this code in Visual Studio Code, the script produces output through 256. But in my console (Linux Mate), it blocks. That's difficult to understand for me...

Upvotes: 4

Views: 3110

Answers (3)

Good Pen
Good Pen

Reputation: 819

From https://www.wikiwand.com/en/C0_and_C1_control_codes#/Unicode:

in unicode

Unicode sets aside 65 code points in the general category Cc (control) for compatibility with ISO/IEC 2022. These are:

  1. U+0000 – U+001F (C0 controls) and U+007F (delete), assigned to the Basic Latin block
  2. U+0080 – U+009F (C1 controls), assigned to the Latin-1 Supplement block.

Unicode only specifies semantics for:

  1. U+0009 – U+000D, U+001C – U+001F
  2. U+0085 (NEL, Next Line. Equivalent to CR+LF. Used to mark end-of-line on some IBM mainframes.)

The rest of the control codes are transparent to Unicode,
and their meanings are left to higher-level protocols.

In the general category Cf (format), there are other format effector characters, e.g.:

  1. the zero-width joiner
  2. non-joiner (controlling ligature)

Upvotes: 1

wjandrea
wjandrea

Reputation: 33136

Firstly, ASCII only goes up to 127 (0x7F). chr() actually returns the Unicode character.

I think the problem is that when U+9D (157) Operating System Command (OSC) is printed, your terminal starts a control string and waits for a String Terminator like U+9C String Terminator, U+1B Escape followed by U+5C backslash, or U+7 BEL. Since none of those sequences are ever printed later, the terminal stops showing the output. For more info, see ANSI escape code § Fe Escape sequences and C1 control codes on Wikipedia.

Unicode characters U+80 (128) to U+9F (159) are control characters, meaning they're not generally printable, so you were never going to get sensible output in the first place.

Upvotes: 8

tdelaney
tdelaney

Reputation: 77397

As mentioned in the comments the characters between 128 and 160 are something of a no-man's land. They are not defined in the Unicode spec but they may have special meaning for various displays. That's the reason why Unicode doesn't touch them - too many variable uses in play.

A terminal such as a Linux xterm accepts control codes to do things like display text in color. Looking at Xterm Control Sequences we see

Privacy Message (PM is 0x9e)

That's 158 decimal and its one of xterms 8-bit control characters. This starts a "private message" that continues until a defined string terminator character is seen. xterm doesn't implement "private message" and it looks from your output that it simply ignores the remaining output as being part of that message.

This is a VT100 type thing. Some terminals may implement some actions. Others may have a character mapped to that octet. You won't find any consistent implementation.

Upvotes: 4

Related Questions