mimosa
mimosa

Reputation: 135

convert from char to char16_t

My config:

I have this method:

static inline std::u16string StringtoU16(const std::string &str) {
    const size_t si = strlen(str.c_str());
    char16_t cstr[si+1];
    memset(cstr, 0, (si+1)*sizeof(char16_t));
    const char* constSTR = str.c_str();
    mbstate_t mbs;
    memset (&mbs, 0, sizeof (mbs));//set shift state to the initial state
    size_t ret = mbrtoc16 (cstr, constSTR, si, &mbs);
    std::u16string wstr(cstr);
    return wstr;
}

I want a conversion between char to char16_T pretty much (via std::string and std::u16string to facilitate memory management) but regardless of the size of the input variable str, it will return the first character only. If str= "Hello" it will return "H". I am not sure what is wrong my my method. Value of ret is 1.

Upvotes: 4

Views: 8513

Answers (2)

mimosa
mimosa

Reputation: 135

I didn't know mbrtoc16() can only handle one character at a time.. what a turtle. Here is then the code I generate, and works like a charm:

static inline std::u16string StringtoU16(const std::string &str) {
    std::u16string wstr = u"";
    char16_t c16str[3] = u"\0";
    mbstate_t mbs;
    for (const auto& it: str){
        memset (&mbs, 0, sizeof (mbs));//set shift state to the initial state
        memmove(c16str, u"\0\0\0", 3);
        mbrtoc16 (c16str, &it, 3, &mbs);
        wstr.append(std::u16string(c16str));
    }//for
    return wstr;
}

for its counterpart (when one way is needed, sooner or later the other way will be needed):

static inline std::string U16toString(const std::u16string &wstr) {
    std::string str = "";
    char cstr[3] = "\0";
    mbstate_t mbs;
    for (const auto& it: wstr){
        memset (&mbs, 0, sizeof (mbs));//set shift state to the initial state
        memmove(cstr, "\0\0\0", 3);
        c16rtomb (cstr, it, &mbs);
        str.append(std::string(cstr));
    }//for
    return str;
}

Be aware that c16rtomb will be lossy if a character cannot be converted from char16_t to char (might endup printing a bunch of '?' depending on your system) but it will work without complains.

Upvotes: 4

Sam Varshavchik
Sam Varshavchik

Reputation: 118340

mbrtoc16() converts a single character, and returns the number of multibyte characters that were consumed in order to convert the char16_t.

In order to effect this conversion, the general approach is:

A) call mbrtoc16().

B) save the converted character, skip the number of characters that were consumed.

C) Have you consumed the entire string you wanted to convert? If no, go back to step A.

Additionally, there could be conversion errors. You must check the return value from mbrtoc16() and do whatever you want to do, to handle conversion errors (the original multibyte string is note valid).

Finally, you should not assume what the maximum size of the char16_t string is going to be equal to or less than the size of the multibyte string. It probably is; but, in some weird locale I suppose that it can, theoretically, be more.

Upvotes: 1

Related Questions