Lunar Mushrooms
Lunar Mushrooms

Reputation: 8968

what are (some of the) UTF8 string functions in C

For dealing with ASCII we have strlen, strcat etc.. For UTF16(i.e, UCS2) we have wcscat and wcslen functions.

For dealing with UTF8 and UCS4 what are the functions available in C? Assume Linux/gcc

Upvotes: 1

Views: 1831

Answers (2)

ugoren
ugoren

Reputation: 16449

I don't think standard C libraries have UTF-8 functions. There are surely libraries for it.

However, normal str functions can be used with UTF-8 in many cases.
strlen works well, returning the number of bytes (not characters). strcat works (it also overruns your buffer easily, but this is normal for strcat).

The reason is that the 0 character can't appear in multi-byte UTF-8 data. So if it appears in a UTF-8 string, it's surely its end, just like in ASCII.

Upvotes: 3

harald
harald

Reputation: 6126

The standard does not specify the encoding or size used for the wide character functions, so assuming it to be UCS2, UCS4 or anything else is not portable. C11 brings standardized unicode support, but I think it's to early to rely on that being implemented yet. Your best bet is to find a library to handle conversion to/from UTF8/UCS4 or any other encoding you may need.

Have a look at iconv, or the chapter on character handling in the GNU C library manual.

Upvotes: 3

Related Questions