Reputation: 77
In C#, we have following functions to convert a string to a UTF-8 encoded sequence of bytes and vice-versa:
Encoding.UTF8.GetString(Byte[])
Encoding.UTF8.GetBytes(Char[])
/ Encoding.UTF8.GetBytes(String)
I am trying to achieve the same thing in C++, as follows:
std::string GetStringFromBytes(std::vector<uint8_t> bytes){
std::string str(bytes.begin(), bytes.end());
return str;
}
std::vector<uint8_t> GetBytesFromString(const std::string& str){
std::vector<uint8_t> bytes(str.begin(), str.end());
return bytes;
}
Is this approach correct? I'm assuming that the string that I'm converting is already in UTF-8 format.
Upvotes: 0
Views: 3515
Reputation: 595319
C# string
uses UTF-16, and thus requires a charset conversion to/from UTF-8.
C++ std::string
does not use UTF-16 (std::u16string
does). So, if you have a UTF-8 encoded std::string
, you already have the raw bytes for it, just copy them as-is. The code you have shown is doing exactly that, and is fine for UTF-8 strings. Otherwise, if you have/need std::string
encoded in some other charset, you will need a charset conversion to/from UTF-8. There are 3rd party Unicode libraries that can handle that, such as libiconv, ICU, etc.
Upvotes: 4