pouzzler
pouzzler

Reputation: 1834

How do I use mblen()?

Here is some test code to help me understand multibyte character management.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char *argv[]) {
    char * line = malloc(1024);
    size_t n;

    getline(&line, &n, stdin);
    while (*line) {
        int offset = mblen(line, strlen(line));
        if (offset == -1) return 0;
        printf("%d\n", offset);
        line += offset;
    }
    return 0;
}

As I understand it, if the user were to type "éléphant", my output should show 2 1 2 1 ...
However, it shows -1 for an mblen error, right from the first byte. I gather it probably ain't a bug in these 2 lines of code, what must I do, what resources can I read to get a hint on what happens here?
Of course a printf("%s", line) would work (and does) work perfectly.

Upvotes: 5

Views: 532

Answers (1)

user786653
user786653

Reputation: 30450

Turning my comment into an answer.

The details might depend on your exact execution environment, but I think the following should apply for most *NIX-systems.

mblen depends on the current locale

The behavior of this function is affected by the LC_CTYPE category of the current locale

The default locale on startup is the "C" locale (see setlocale), which might not match what you're expecting. Conveniently you can call setlocale(LC_CTYPE, "") to set the locale to the "native" environment.

Note that calling setlocale(LC_ALL, "") (as I originally wrote) changes more that you're possibly expecting, so be sure to read up on all things locale-related before doing that.

Upvotes: 5

Related Questions