Benjamin
Benjamin

Reputation: 2798

Extended Ascii in Linux

How would I print these characters in Linux?

│ (ascii 179)

├ (ascii 195)

└ (ascii 192)

─ (ascii 196)

I cannot find any octal values that would work with echo -e "\0xxx", any ideas?

Upvotes: 10

Views: 27665

Answers (5)

BlueChip
BlueChip

Reputation: 170

ASCII (invented in 1960) is actually a /seven/ bit encoding standard, and as such only the characters in the range {0..127} are defined.

Characters {128..255} are (commonly) referred to as the "alt page" and can be mapped any way you wish. There are MANY "standard" mappings, of which "CodePage 437" is/was a popular choice (thanks to DOS support, and "ANSI artists" of the 80s and 90s).

Under Linux, you can use luit to intercept stdin and stdout and convert data on the fly.

Create a suitable demo file

(for i in `seq 128 255` ; do printf "\x$(printf "%02x" $i)" ; done; echo) >demo.txt

Hexdump the file (sanity check)

hexdump -C demo.txt

Display the file normally

cat demo.txt

Display the file as cp437 (as per OP):

luit -encoding cp437 cat demo.txt

Display the file as cp850 (as an additional example):

luit -encoding cp850 cat demo.txt

You can also just run luit -encoding cp437 to open a sub-shell with cp437 encoding (use ^D to exit luit), at which point your [OP] echo statements should work as desired.

Upvotes: 1

shmatt
shmatt

Reputation: 41

Because some people may still want to know this...

See the lines that uses iconv to translate.

To print all ascii/extended ascii codes CP437 in Linux/bash script:

# heading index with div line
printf "\n      "; # indent

for x in {0..15}; do printf "%-3x" $x; done;
printf "\n%46s\n" | sed 's/ /-/g;s/^/      /';

# two lines with dots to represent control chars
c=$(echo "fa" | xxd -p -r | iconv -f 'CP437//' -t 'UTF-8')
printf "%32s" | sed 's/../'"$c"'  /g;s/^/  0   /;s/$/\n\n/'
printf "%32s" | sed 's/../'"$c"'  /g;s/^/  1   /'

# convert dec to codepage 437 in a table
for x in {32..255};
do

  # newline every 16 translated code values
  (( x % 16 == 0 )) && printf "\n\n"

  # left index numbers
  let "n = x % 15"
  (( (x % 16) == 0 )) && printf "%-4x" $n | sed 's/0/f/;s/^/  /'

  # conversion of x integer value to symbol
  printf "%02x" $x | xxd -p -r | iconv -f 'CP437//' -t 'UTF-8' | sed 's/.*/&  /'

  # div line
  (( x == 127 )) && printf "%46s" | sed 's/ /-/g;s/^/      /;i\ '

done
printf "%46s" | sed 's/ /-/g;s/^/\n      /;s/$/\n      /'; # div line
for x in {0..15}; do printf "%-3x" $x; done;
echo

Upvotes: 3

AnxiousNut
AnxiousNut

Reputation: 61

You can use the exact same codes you provided or of the extended ASCII character set (e.g. 195 for ├) if you've got the right encoder to display the characters.

On Linux, we lack the non-standard extended ASCII character set support - which is why it's not displayed. However, I found another character set that's available for Linux and is almost similar to the extended ASCII character set. It's IBM855.

All you have to do is changed the character encoding of your command line application to IBM855. All popular box drawing characters have the same code of the extended ASCII character set - which is the most important.

You may compare the sets by this image and this image.

PS: If you're using gnome-terminal, you can add IBM855 charset by clicking the "Terminal" menu from the menu bar -> "set character encoding" -> "Add or Remove". Look for IBM855, and add it. Now just choose the encoding from "terminal"->"set character encoding"->"Cyrillic (IBM855)".

They boxes were enough for my homework. Hope this helps. :)

Upvotes: 6

drysdam
drysdam

Reputation: 8637

After much poring over man printf and info printf, I think I've gotten this to work.

The basic issue seems to be that bash has a built-in printf that doesn't work. And, despite what the man/info pages, say, \U doesn't work. \u still does, though.

env printf '\u2502'

gets me a vertical box character.

Upvotes: 9

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798686

Either switch the font to one that is in PC-8/CP437 encoding, or use the Unicode values for those characters instead, encoded into the current charset.

Upvotes: 2

Related Questions