felknight
felknight

Reputation: 1421

Are Unicode Control Characters still in use?

For what I understand control characters were used in Terminals for specific purposes like \n \t \r to break lines and create tabulations. These ones are still heavily used. And their existence make sense.

But the are others that apparently are not used anymore like Set Transmit State(STS), and String Terminator(ST).

For some reason the Unicode standar decided to keep them. For example the latter two (STS, ST) are still in use in modern applications?

What is the reason to keep them in modern times?

Upvotes: 3

Views: 787

Answers (2)

Simon Kissane
Simon Kissane

Reputation: 5258

Unicode inherits 65 control characters from earlier standards – the 32 C0 controls, DEL, and the 32 C1 controls. There is no single answer to "are they still in use?" – the answer depends on the character. It isn't feasible to discuss all 65 in a single question, so let me just address the two specific ones you asked about, ST (String Terminator, 0x9C) and STS (Set Transmit State, 0x93)

String Terminator

This control character is understood by xterm, and possibly some other terminal emulators as well. There are five C1 control characters which introduce "out-of-band" messages:

0x90    DCS     DEVICE CONTROL STRING
0x98    SOS     START OF STRING
0x9D    OSC     OPERATING SYSTEM COMMAND
0x9E    PM      PRIVACY MESSAGE
0x9F    APC     APPLICATION PROGRAM COMMAND

For all five of these, the message is terminated with an ST. xterm supports OSC, to perform various operations such as changing the window title. xterm doesn't support any of the others, but some other terminal emulators provide various functions via APC as well. Physical DEC VT series terminals used DCS so the host could modify the terminal configuration, e.g. installing custom fonts.

To give an example, you can change your window title in xterm like this:

printf '\x1b]0;first example\x1b\\'

That's not sending any C1 controls, just escape – but ESC ] is the 7-bit encoding of OSC, and ESC \ is the 7-bit encoding of ST. The 0; prefix identifies the "OPERATING SYSTEM COMMAND" you want to run is "Change Icon Name and Window Title". (You can actually set the icon name and window title independently, if you want, by using 1; and 2; instead.)

xterm will actually accept 8-bit C1 controls:

printf '\x9d0;second example\x9c'

Which should also work. If it doesn't, it is probably because your xterm is using a UTF-8 locale, in which case you have to UTF-8 encode the C1 controls:

printf '\xc2\x9d0;third example\xc2\x9c'

You can start an xterm with UTF-8 disabled using:

LANG=C xterm

Many other terminal emulators, which advertise xterm compatibility, support the 7-bit encoded OSC 0;title ST sequence (i.e. ESC ] 0;title ESC \), but not the 8-bit C1 encoded form (whether UTF-8 encoded or not). Note also that for legacy reasons, xterm (and many of its clones) support the BEL character (0x07) instead of ST, but strictly speaking that is incorrect.

Set Transmit State

This was used by some terminals which had block mode support – for example the Ann Arbor Ambassador, and DEC VT131 (see PDF page 171 of VT131 Video Terminal User Guide). For both, STS (ESC S) was a signal that the user had finished filling out a form, and had pressed the ENTER key (or SEND on the Ambassador) to tell the terminal to transmit its contents to the host.

The VT131 block mode supported two sub-modes of operation – immediate and deferred (selected by the DECTEM control sequence). In immediate mode, as soon the user hit ENTER, the terminal would transmit the form to the host. In deferred mode, when the user hit ENTER, the terminal would send STS to the host to signal it was ready to transmit the form data, but wouldn't actually send it until it had received DECXMIT control sequence (ESC 5).

The VT131 was 7-bit only, so it doesn't actually support encoding STS as 0x93, it only supports it as ESC S. However, the two encodings are logically equivalent; and, some later DEC terminals, such as the VT340, supported both block mode and 8-bit controls (enabled via the S8C1T control sequence), so they may have supported sending STS as 0x93.

Mainstream terminal emulators don't support VT131 block mode, but there are some niche ones which do. It was used mainly with business applications for OpenVMS, and it is possible that some legacy OpenVMS apps which employ it are still in use today. An example of a niche terminal emulator which implements this block mode (including STS), is Siemens SINUMERIK 880. That's actually an early 1990s vintage operator panel for industrial control applications, with an embedded VT340 emulator – if a minicomputer (such as a MicroVAX) was being used to control some industrial process, the SINUMERIK 880's VT340 emulation would enable operators to interact with it from a location convenient to the actual process itself.

So, while there is widely available software that supports ST (xterm), if anyone is still using software that supports STS, it is going to be some obscure legacy system.

Upvotes: 3

user149341
user149341

Reputation:

One of the major initial goals of Unicode was to be able to unambiguously represent any valid character from any existing character encoding, making it possible to "round-trip" text from another character encoding through Unicode (e.g, ISO-8859-1 to Unicode to ISO-8859-1) without ending up with something different from the original text.

Removing any of these characters from Unicode would have made it impossible to losslessly convert text which included those control characters. Leaving them in place is harmless, and makes it much easier to convert ASCII text to Unicode (since the codepoints between U+0000 and U+007F all align with ASCII).

Upvotes: 4

Related Questions