Arjun Mehta
Arjun Mehta

Reputation: 2542

How might I calculate the terminal column width of various characters?

I'm looking to calculate the number of terminal columns various printing and non-printing ascii/unicode characters will occupy in a terminal view.

For example, horizontal tab (\t) occupies 8 columns, color codes (i.e. \x1b32m) occupy 0 columns, and fixed-size wide-character strings (i.e. 한) might occupy 2 columns. Of course there are many in the primary ASCII set that only occupy 1 column (ie. a-Z/0-9, punctuation etc.).

I've come across the node.js module, wcwidth, that seems to help calculate wide-character strings, but doesn't do what I'd expect for other characters, like color codes, and tabs.

For example:

var wcwidth = require('wcwidth');

console.log("TAB WIDTH", wcwidth('\t'));
console.log("한 WIDTH", wcwidth('한'));
console.log("Color Code WIDTH", wcwidth('\x1b32m'));
console.log("X WIDTH", wcwidth('X'));

Outputs:

TAB WIDTH 0
한 WIDTH 2
Color Code WIDTH 3
X WIDTH 1

I can't seem to find any information about this anywhere, though I'd imagine it would be a common thing people have had to solve in the ancient past.

If there might be a way using a bash script, or any library, application or tool, I'm totally open to that as well.

Any help much appreciated! :) Thanks

Upvotes: 1

Views: 1344

Answers (2)

rici
rici

Reputation: 241691

A tab does not occupy 8 columns. It outputs a single space and then enough spaces to ensure that the next character will be output at the next column whose index is 0 mod 8 (Or 1 mod 8 if you count from 1.) In other words, you cannot tell how wide a tab is unless you know where you are on the line.

A color code (\x1b[32m) might occupy zero space, but it also might not; it depends on the nature of the terminal emulator for the console. Most terminal emulators will recognize the CSI[Pm code but there are other codes which are quite a bit more idiosyncratic. For example,

printf $'\x1b]2;A window\x1b\\'

will set the window title in xterm, and hence will produce no output. But in a Linux console, the text ;A window will be displayed, occupying 9 characters.

In short, it is not so easy a problem, and you can only answer it with a lot of context because there is no absolute answer.

Upvotes: 4

deltab
deltab

Reputation: 2556

This is indeed an issue for any program that needs to know where the cursor is on screen, from tabular output in ls through editable command lines to full-screen applications. As you've noticed, it's not solved by wcwidth or wcswidth, which are defined only for (strings of) printable characters. (Even that is not well defined for many characters.) Also, control sequences can not only change colours but also cursor positioning and even, where supported, font size effects.

Instead, terminal control libraries such as ncurses [npm search] are sometimes used. These don't seem to tell you string widths either, but because they track text attributes such as colour separately, and generate control sequences themselves to position and style text, they provide some assistance in putting things on screen in given locations.

Unfortunately I don't believe there's much available beyond that, with applications either ignoring the complexities or handling them in ad hoc ways.


To clear up a common misconception: Horizontal Tab (HT, \t) doesn't have a width as such; it's a 'format effector', like Carriage Return or Form Feed, that repositions the cursor according to certain rules.

HT (Horizontal Tabulation): A format effector which controls the movement of the printing position to the next in a series of predetermined positions along the printing line. (Applicable also to display devices and the skip function on punched cards.)

— USA Standard Code for Information Interchange [ASCII], 1968, as reprinted in RFC 20

The most common implementation is to have fixed tab stops every eight columns:

                                1       2
                1.......9.......7.......5.....

1\tXYZ          1       XYZ
12\tXYZ         12      XYZ
1234567\tXYZ    1234567 XYZ
12345678\tXYZ   12345678        XYZ
123456789\tXYZ  123456789       XYZ

though some systems support control sequences or other ways to set the positions of the tab stops at arbitrary distances, like the ruler bar in some word processors.

Upvotes: 3

Related Questions