Reputation: 163
I need to determine the length of UTF-8 string in bytes in C. How to do it correctly? As I know, in UTF-8 terminal symbol has 1-byte size. Can I use strlen function for this?
Upvotes: 3
Views: 6651
Reputation: 183873
Can I use strlen function for this?
Yes, strlen
gives you the number of bytes before the first '\0'
character, so
strlen(utf8) + 1
is the number of bytes in utf8
including the 0-terminator, since no character other than '\0'
contains a 0 byte in UTF-8.
Of course, that only works if utf8
is actually UTF-8 encoded, otherwise you need to convert it to UTF-8 first.
Upvotes: 11
Reputation: 71899
Yes, strlen() will simply count the bytes until it encounters the NUL, which is the correct terminator for a 0-terminated UTF-8-encoded C string.
Upvotes: 2