Fornax-A
Fornax-A

Reputation: 1032

How to convert from utf32 to Unicode String

I have been looking around for some time for this question but always end up with something different.

I have the following UTF-32 string: std::u32string utf32s = U"जि"; And I would like to convert to an UnicodeString: UnicodeString ustr;

I am using the ICU 65.1 library in C++ to deal with Unicode String for normalization and composition, I found the following link which describe in a very poor way the conversion between strings. Especially the following description:

  1. Conversion of whole strings: u_strFromUTF32() and u_strFromUTF32() in ustring.h.

  2. Access to code points is trivial and does not require any macros.

  3. Using a UTF-32 converter with all of the ICU conversion APIs in ucnv.h, including ones with an "Algorithmic" suffix.

  4. UnicodeString has fromUTF32() and toUTF32() methods.

The alternative I have found is the following template function:

template <typename T>
void fromUTF32(const std::u32string& source, std::basic_string<T, std::char_traits<T>, std::allocator<T>>& result)
{
    wstring_convert<codecvt_utf8_utf16<T>, T> convertor;
    result = convertor.from_bytes(source);
}

This function anyhow seams not to recognize UnicodeString as valid input. More in general, given a string (wstring, string, u16string ...) how to create a template function to get it as a Unicode String ?

Many thanks !

Upvotes: 1

Views: 1428

Answers (2)

Shawn
Shawn

Reputation: 52336

#include <iostream>
#include <string>
#include <unicode/unistr.h>
#include <unicode/ustream.h>

int main() {
  std::u32string utf32s = U"जि";
  auto ustr = UnicodeString::fromUTF32(
      reinterpret_cast<const UChar32 *>(utf32s.c_str()), utf32s.size());
  std::cout << ustr << '\n';

  return 0;
}
$ g++ u32.cpp $(icu-config --cxxflags --ldflags --ldflags-icuio)                               
$ ./a.out
जि

Upvotes: 3

Maxim Egorushkin
Maxim Egorushkin

Reputation: 136208

You should probably use icu::UnicodeString::fromUTF32:

icu::UnicodeString asUnicodeString(std::u32string const& s) {
    static_assert(sizeof(std::u32string::value_type) == sizeof(UChar32), "");
    static_assert(alignof(std::u32string::value_type) == alignof(UChar32), "");
    return icu::UnicodeString::fromUTF32(reinterpret_cast<UChar32 const*>(s.data()), s.size());
}

Upvotes: 2

Related Questions