ghostplant
ghostplant

Reputation: 87

Will gcc functions in string.h break UTF-8 string?

I don't know the following cases in GCC, who can help me?

  1. Whether a valid UTF-8 character (except code point 0) still contains zero byte? If so, I think function such as strlen will break that UTF-8 character.

  2. Whether a valid UTF-8 character contains a byte whose value is equal to '\n'? If so, I think function such as "gets" will break that UTF-8 character.

  3. Whether a valid UTF-8 character contains a byte whose value is equal to ' ' or '\t'? If so, I think function such as scanf("%s%s") will break that UTF-8 character and be interpreted as two or more words.

Upvotes: 0

Views: 133

Answers (1)

Yu Hao
Yu Hao

Reputation: 122433

The answer to all your questions are the same: No.

It's one of the advantages of UTF-8: all ASCII bytes do not occur when encoding non-ASCII code points into UTF-8.

For example, you can safely use strlen on a UTF-8 string, only that its result is the number of bytes instead of UTF-8 code points.

Upvotes: 5

Related Questions