Noice
Noice

Reputation: 93

determine whether a unicode character is fullwidth or halfwidth in C++

I'm writing a terminal (console) application that is supposed to wrap arbitrary unicode text.

Terminals are usually using a monospaced (fixed width) font, so to wrap a text, it's barely more than counting characters and watching whether a word fits into a line or not and act accordingly.

Problem is that there are fullwidth characters in the Unicode table that take up the width of 2 characters in a terminal.

Counting these would see 1 unicode character, but the printed character is 2 "normal" (halfwidth) characters wide, breaking the wrapping routine as it is not aware of chars that take up twice the width.

As an example, this is a fullwidth character (U+3004, the JIS symbol)

〄
12

It does not take up the full width of 2 characters here although it's preformatted, but it does use twice the width of a western character in a terminal.

To deal with this, I have to distinguish between fullwidth or halfwidth characters, but I cannot find a way to do so in C++. Is it really necessary to know all fullwidth characters in the unicode table to get around the problem?

Upvotes: 9

Views: 3784

Answers (2)

kralyk
kralyk

Reputation: 4397

There's no need to build tables, people from Unicode have already done that:

http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c

The same code is used in terminal emulating software such as xterm[1], konsole[2] and quite likely others...

Upvotes: 5

ecatmur
ecatmur

Reputation: 157484

You should use ICU u_getIntPropertyValue with the UCHAR_EAST_ASIAN_WIDTH property.

For example:

bool is_fullwidth(UChar32 c) {
    int width = u_getIntPropertyValue(c, UCHAR_EAST_ASIAN_WIDTH);
    return width == U_EA_FULLWIDTH || width == U_EA_WIDE;
}

Note that if your graphics library supports combining characters then you'll have to consider those as well when determining how many cells a sequence uses; for example e followed by U+0301 COMBINING ACUTE ACCENT will only take up 1 cell.

Upvotes: 7

Related Questions