How to count length of string in bytes with unicode characters of more than 1 byte?

Question

Because a string in C can contain unicode characters of several bytes, where one of the bytes may be a terminating \0 character, I don't think strlen works well when it comes to counting how many bytes there is in such a string.

How to count the length in bytes of such a string properly? I'm not the one allocating the memory for it, but rather I use the property char d_name[256] of the struct dirent in the library dirent.h. Is there any way to see how long the string names are besides just copying the entire 256 bytes? What if I couldn't just have copied the 256 bytes?

Mikhail Maltsev · Accepted Answer

What do you mean by unicode? If it's UTF-8 (dirent.h is a part of POSIX API, so it should be UTF-8), it can't contain '\0' in the middle. strlen will do exactly what you need. If you are using some non-standard version of dirent (maybe some strange port for Windows) with UTF-16, you may use appropriate wide-character string functions.

How to count length of string in bytes with unicode characters of more than 1 byte?

Answers (1)

Related Questions