Clang locale issue on macOS when handling wide characters

Question

I'm currently working on a C++ project on macOS, using Clang as my compiler. I've encountered a problem related to the locale settings when dealing with wide characters. Here is a simplified version of my code:

#include 
#include 
#include 
using namespace std;
int main() {
    locale zhLocale("");
    wcin.imbue(zhLocale);
    wcout.imbue(zhLocale);

    wstring input;
    getline(wcin, input);
    wcout << input << endl;

    return 0;
}

and the input is:

你好

output:

你你你好

During debugging, it is found that the input variable becomes L"\U00000002\U00000002你你你好"

In launch and debug I see input was wrong

and this is my envionment variables:

$ clang++ --version
Apple clang version 16.0.0 (clang-1600.0.26.6)
Target: arm64-apple-darwin24.3.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

$ locale                                        
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

I would appreciate it if anyone could help me figure out what's going wrong and how to fix it. Is this a bug in Clang's handling of locale settings on macOS, or am I doing something wrong in my code?

I tried the correct code(I think), and I expect the output equals to input and the correct program behavior

When I remove imbue, this piece of code works just like cin.

#include 
#include 
#include 
using namespace std;
int main() {
    wstring input;
    getline(wcin, input);
    wcout << input << endl;

    return 0;
}

你好

你好

However, when I open the debugger and check the content of inputs, the content in its data array is [L'\U0000fffd', L'\U00000001', L'\U00000006', L'\0', L' '] instead of ['你', '好'] as I expected. In this case, I can't iterate over individual Chinese characters. This is the same as using cin, and I also can't iterate over individual Chinese characters when using cin.

Clang locale issue on macOS when handling wide characters

Answers (1)

Related Questions