scott
scott

Reputation: 1637

C++ Ansi escape codes and it's interpretation when done manually

I just noticed this accidentally, with the following code. In the following code,

char teststring[20];
cin.getline(teststring, 20);

the prompt stops for a userinput, and when I press an up arrow, which I did out of muscle memory to check the bash history, it printed the Ansi Escape code (got the details from here) ^[[A and when I pressed a single backspace and pressed enter, the character A got deleted and it printed an un-readable garbage, instead of ^[[, but when I typed the same keys manually or copy-pasted again (to make sure that it is not a similar looking symbol in ASCII) without the last letter, it printed ^[[. What would be the reason though the characters entered were the same?

This is the output

Upvotes: 1

Views: 539

Answers (1)

rici
rici

Reputation: 241701

The Unix terminal is a very intricate beast. Posix includes a pretty thorough description of its features; the below is just a quick summary.

Normally, the terminal input device operates in "canonical" mode. In that mode, the terminal driver maintains a line buffer which it fills when necessary by reading user input. If the buffer is emptied and more data is requested by the program, the driver will read an entire line of input before providing any more data to the program. So if the buffer is empty, even a getc to read a single character will cause an entire to be read into the terminal driver's buffer before the getc returns.

As the driver reads input characters, it checks for certain special characters; anything else is added to the line buffer and echoed to the terminal device. (Input and output to a terminal device are independent; if the driver or the program didn't echo input, nothing would appear on the screen, which would usually be confusing. Programs turn echoing off in order to be able to accept passwords, for example.)

All of the special characters are configurable. There are quite a few; here are some of the more common ones:

  • Enter Inserts a newline character into the buffer, and terminates the input line so that the pending read will return.
  • Ctrl-D (EOF) The character itself is discarded, but the input is terminated and a pending read returns. If the input buffer is empty -- i.e., the Ctrl-D was pressed at the beginning of a line, a zero-length buffer will be returned to the pending read, which will be interpreted as an end of file marker.
  • Bksp (ERASE) Unless the input buffer is empty, removes the last character from the input buffer and erases it from the screen.
  • Ctrl-C (INTR) Sends SIGINT to the process.
  • Ctrl-Z (SUSP) Sends SIGTSTP to the process.
  • Ctrl-U (KILL) Deletes the entire input buffer.
  • Ctrl-S (STOP) Stops output.
  • Ctrl-Q (START) Resumes output if it has been stopped with the STOP character.

When the Linux terminal driver is echoing characters, it will normally echo control characters (characters whose ascii code is less than 0x20) as a caret (^) followed by the character whose code is 0x40 higher, which is usually a letter. The ESC character has the code 0x1B, so it will normally be echoed as a caret followed by the character 0x5B, which is an open square bracket. Hence, you would normally expect ESC to echo as ^[.

Many keys on the keyboard actually send more than one character, and almost all of these sequences start with ESC[. The uparrow, for example, sends the codes ESC[A, and so if you are running a naive program which doesn't handle cursor moving characters, you will see ^[[A echoed when you press the uparrow key.

The character you see is one of the ways used to show characters which don't correspond to any Unicode glyph. The box contains four hex digits, which correspond to the Unicode codepoint, in this case U+001B, which is an ESC character. I don't know why this happened, but it is most likely the result of a race condition between the various components which contribute to the terminal echo.

Upvotes: 2

Related Questions