Will gcc functions in string.h break UTF-8 string?

Question

I don't know the following cases in GCC, who can help me?

Whether a valid UTF-8 character (except code point 0) still contains zero byte? If so, I think function such as strlen will break that UTF-8 character.
Whether a valid UTF-8 character contains a byte whose value is equal to ' '? If so, I think function such as "gets" will break that UTF-8 character.
Whether a valid UTF-8 character contains a byte whose value is equal to ' ' or ' '? If so, I think function such as scanf("%s%s") will break that UTF-8 character and be interpreted as two or more words.

Yu Hao · Accepted Answer

The answer to all your questions are the same: No.

It's one of the advantages of UTF-8: all ASCII bytes do not occur when encoding non-ASCII code points into UTF-8.

For example, you can safely use strlen on a UTF-8 string, only that its result is the number of bytes instead of UTF-8 code points.

Answers (1)