mbrochma
mbrochma

Reputation: 3

Can tab characters appear in an iso-8859-8 file?

I have a file that I believe to be in the ISO-8859-8 format. However, it has tabs in it, which doesn't seem to appear in this character set:

https://en.wikipedia.org/wiki/ISO/IEC_8859-8

Does this mean that the file isn't in the ISO-8859-8 format after all? Can ISO-8859-8 encoded characters be combined with tabs?

Upvotes: 0

Views: 196

Answers (2)

Giacomo Catenazzi
Giacomo Catenazzi

Reputation: 9533

An "ISO 8859-8 file" is interpreted usually as a file which contain standard C0, C1, and DEL. So you can use control characters without problems.

But technically, ISO 8859-8 is just defining the characters as show in Wikipedia. Remember that files were not so relevant in such times, but the transmission of data between different systems (you may have the transmitted data stored as native file, so transcoded, but so the concept of "file encoding" was not so important). So we have ISO 2022 and ISO 4873 which defines the basic idea of encodings, to transmit data, and they define the "ANSI escape sequences". With such sequences you can redefine how the C0, C1, and the two letter blocks (G0, G1) are used.

So your system may decide to use ASCII for initial communication, then switch C0 for better control between the systems, and then load G0 and G1 with ISO 8859-8, so you can transmit your text file (and then maybe an other encoding, for a second stream of data in an other language).

So, technically, Wikipedia tables are correct, but now we use to share files without transcoding them, and so without changing encoding with ANSI escape characters, and so we use "ISO 8859-8 file" as a way to describe a ISO 8859-8 graphical characters (G0 and G1), and we allow we extra control characters (usually TAB, NL (and CR, LF), sometime also NUL, VT). This is also embedded in the string iso-8859-8 used by IANA, and so web browsers and email. But note: usually you cannot use all C0 and C1 control codes (some are forbidden by standards, and some should not be used (usually) in files, e.g. ANSI escape sequences, NUL bytes and so may be misinterpreted, or discarded (and possibly this will give a security problem).

In short: ISO 8859-8 technically do not define control codes. But usually we allow some of them in files (TAB is one of them). Check the file protocol to know which control codes are allowed (please no BEL, and ANSI escape characters)

Upvotes: 0

dan04
dan04

Reputation: 91209

Yes.

The tab (\t) character is one of the standard C0 control codes, along with Null (\0), Bell/Alert (\a), Backspace (\b), Line Feed (\n), Vertical Tab (\v), Form Feed (\f), Carriage Return (\r), Escape (\x1B), etc.

According to Wikipedia's page on ISO/IEC 8859:

The ISO/IEC 8859 standard parts only define printable characters, although they explicitly set apart the byte ranges 0x00–1F and 0x7F–9F as "combinations that do not represent graphic characters" (i.e. which are reserved for use as control characters) in accordance with ISO/IEC 4873; they were designed to be used in conjunction with a separate standard defining the control functions associated with these bytes, such as ISO 6429 or ISO 6630. To this end a series of encodings registered with the IANA add the C0 control set (control characters mapped to bytes 0 to 31) from ISO 646 and the C1 control set (control characters mapped to bytes 128 to 159) from ISO 6429, resulting in full 8-bit character maps with most, if not all, bytes assigned. These sets have ISO-8859-n as their preferred MIME name or, in cases where a preferred MIME name is not specified, their canonical name. Many people use the terms ISO/IEC 8859-n and ISO-8859-n interchangeably.

IOW, even though the official character chart only lists the printable characters, the C0 control characters, including Tab, are for all practical purposes part of the ISO-8859-n encodings.

Your linked article even explicitly says so.

ISO-8859-8 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429.

Upvotes: 1

Related Questions