Reputation: 55
Why does my function count more characters than expected?
int countLength(char* buffer){
int cnt = 0;
for (int i=0; buffer[i] != '\n' && buffer[i] != '\0'; i++){
cnt++;
}
return cnt;
}
For example, if i pass it "Será chuva? Será gente?" as input, it gives 25 instead of 23. why is that?
Upvotes: 1
Views: 76
Reputation: 45694
The code gives you the right answer, even if it is not the answer you expect.
The problem is that you expect it to count graphemes (like á
, while it counts bytes / code-units (á
consists of two code-units in utf-8 normal form composed).
A first approximation would be to count code-points instead, by skipping continuation-bytes (>0x7f and <0xc0). To actually count graphemes, you would have to use a proper unicode-library with all the character-information like international components for unicode (ICU), and accept their decisions.
Read up on character-sets, especially unicode and the utf-8 encoding.
As an aside, cnt
always mirrors i
. While an optimizing compiler will remove this duplication, it shouldn't even be there.
Upvotes: 2