Reputation: 4774
Somehow I couldn't find the answer in Google. Probably I'm using the wrong terminology when I'm searching. I'm trying to perform a simple task, convert a number that represents a character to the characters itself like in this table: http://unicode-table.com/en/#0460
For example, if my number is 47 (which is '\'), I can just put 47 in a char
and print it using cout
and I will see in the console a backslash (there is no problem for numbers lower than 256).
But if my number is 1120, the character should be 'Ѡ' (omega in Latin). I assume it is represented by several characters (which cout
would know to convert to 'Ѡ' when it prints to the screen).
How do I get these "several characters" that represent 'Ѡ'?
I have a library called ICU, and I'm using UTF-8.
Upvotes: 7
Views: 6100
Reputation: 149
Another alternative is to do it using only standard components. The following example treats the Unicode code point as a std::u32string
and returns it as a std::string
.
Creating a std::u32string
with a Unicode code point is simple:
Method 1: using brace init (calling `initializer_list ctor)
std::u32string u1{codePointNumber};
// For example:
std::u32string u1{305}; // 305 is 'ı'
Method 2: using operator +=
std::u32string u2{}; // Empty string
// For example:
u2 += 305;
To convert std::u32string
to a std::string
, you can use std::wstring_convert
from the <locale>
header:
#include <iostream>
#include <codecvt>
#include <string>
#include <locale>
std::string U32ToStr(const std::u32string& str)
{
std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> conv;
return conv.to_bytes(str);
}
int main()
{
std::u32string u1{305};
std::cout << U32ToStr(u1) << "\n";
return 0;
}
Note that std::wstring_convert
is deprecated (yet not removed) in C++17 and later, so you may want to use an alternative method if you are using a newer version of C++.
Upvotes: 0
Reputation: 33658
What you call Unicode number is typically called a code point. If you want to work with C++ and Unicode strings, ICU offers a icu::UnicodeString class. You can find the documentation here.
To create a UnicodeString holding a single character, you can use the constructor that takes a code point in a UChar32:
icu::UnicodeString::UnicodeString(UChar32 ch)
Then you can call the toUTF8String method to convert the string to UTF-8.
Example program:
#include <iostream>
#include <string>
#include <unicode/unistr.h>
int main() {
icu::UnicodeString uni_str((UChar32)1120);
std::string str;
uni_str.toUTF8String(str);
std::cout << str << std::endl;
return 0;
}
On a Linux system like Debian, you can compile this program with:
g++ so.cc -o so -licuuc
If your terminal supports UTF-8, this will print an omega character.
Upvotes: 9