Reputation: 7711
char *w = "Artîsté";
printf("%lu\n", strlen(w));
int z;
for(z=0; z<strlen(w); z++){
//printf("%c", w[z]); //prints as expected
printf("%i: %c\n", z, w[z]);//doesn't print anything
}
If I run this, it fails at the î
. How do I print a multibyte char and how do I know when a I've hit a multibyte character?
Upvotes: 0
Views: 919
Reputation: 7655
You can use mbtowc
in order to convert multi-byte character string into a wide character. This function also returns the number of bytes read, therefore you can advance the string:
setlocale(LC_ALL, "");
wchar_t c;
char *s = "Artîsté";
mbtowc (NULL, NULL, 0); // reset
while (*s)
{
s += mbtowc(&c, s, MB_CUR_MAX);
printf("%lc\n", c);
}
Reference: mbtowc
Upvotes: 0
Reputation: 11162
Use the wide char and multi-byte functions:
int utf8len(char *str)
{
int len, inc;
// mblen(NULL, 0) is needed to reset internal conversion state
for (len = 0, mblen(NULL, 0); *str; len++, str += inc)
if ((inc = mblen(str, MB_CUR_MAX)) < 0)
return inc;
return len;
}
int main()
{
setlocale(LC_ALL, "");
char *w = "Artîsté";
printf("%lu\n", strlen(w));
int z, len = utf8len(w);
wchar_t wstr[len+1];
mbstowcs(wstr, w, len);
for(z=0; z<len; z++)
printf("%i: %lc\n", z, wstr[z]);
}
You got lucky with the first printf, because you never changed the data, once you split up the chars, your output was no longer utf8.
Upvotes: 1
Reputation: 47408
If your execution environment uses UTF-8 (Linux, for example), your code will work as-is, as long as you set a suitable locale, as in setlocale(LC_ALL, "en_US.utf9");
before calling that printf.
demo: http://ideone.com/zFUYM
Otherwise, your best bet is probably to convert to wide string and print that. If you plan on doing something other than I/O with the individual characters of that string, you will have to do it anyway.
As for hitting a multibyte char, the portable way to test is if mblen()
returns a value greater than 1.
Upvotes: 1