Get number of characters in string?

Question

I have an application, accepting a UTF-8 string of a maximum 255 characters.

If the characters are ASCII, (characters number == size in bytes).

If the characters are not all ASCII and contains Japanese letters for example, given the size in bytes, how can I get the number of characters in the string?

Input: char *data, int bytes_no
Output: int char_no

phschoen · Accepted Answer

You can use mblen to count the length or use mbstowcs

source:

http://www.cplusplus.com/reference/cstdlib/mblen/

http://www.cl.cam.ac.uk/~mgk25/unicode.html#mod

The number of characters can be counted in C in a portable way using mbstowcs(NULL,s,0). This works for UTF-8 like for any other supported encoding, as long as the appropriate locale has been selected. A hard-wired technique to count the number of characters in a UTF-8 string is to count all bytes except those in the range 0x80 – 0xBF, because these are just continuation bytes and not characters of their own. However, the need to count characters arises surprisingly rarely in applications.

you can save a unicode char in a wide char wchar_t

Get number of characters in string?

Answers (2)

Related Questions