Reputation: 1834
Here is some test code to help me understand multibyte character management.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[]) {
char * line = malloc(1024);
size_t n;
getline(&line, &n, stdin);
while (*line) {
int offset = mblen(line, strlen(line));
if (offset == -1) return 0;
printf("%d\n", offset);
line += offset;
}
return 0;
}
As I understand it, if the user were to type "éléphant", my output should show 2 1 2 1 ...
However, it shows -1 for an mblen
error, right from the first byte.
I gather it probably ain't a bug in these 2 lines of code, what must I do, what resources can I read to get a hint on what happens here?
Of course a printf("%s", line)
would work (and does) work perfectly.
Upvotes: 5
Views: 532
Reputation: 30450
Turning my comment into an answer.
The details might depend on your exact execution environment, but I think the following should apply for most *NIX-systems.
mblen
depends on the current locale
The behavior of this function is affected by the LC_CTYPE category of the current locale
The default locale on startup is the "C" locale (see setlocale
), which might not match what you're expecting. Conveniently you can call setlocale(LC_CTYPE, "")
to set the locale to the "native" environment.
Note that calling setlocale(LC_ALL, "")
(as I originally wrote) changes more that you're possibly expecting, so be sure to read up on all things locale-related before doing that.
Upvotes: 5