Reputation: 375
Using modern C++ and the std library, what is the easiest and cleanest way to convert a std::string
containing windows-1252
encoded characters to utf-8
?
My use case is I'm parsing a CSV files which is windows-1252
encoded, and then push some of its data to node-js using Node-Api (node-addon-api), which requires UTF-8
encoded strings.
Upvotes: 0
Views: 873
Reputation: 597941
Using just the standard library, the closest solution would probably be to use std::wstring_convert
with a custom Windows-1252 facet to convert the std::string
to a std::wstring
, and then use std::wstring_convert
with a standard UTF-8 facet to convert the std::wstring
to a std::string
.
However, std::wstring_convert
is deprecated since C++17, with no replacement in sight. So you are better off using a 3rd-party Unicode library to handle the conversion, such as iconv, ICU, etc. Or platform-specific APIs, like MultiByteToWideChar()
and WideCharToMultiByte()
on Windows, etc.
Or, you could simply implement the conversion yourself, since Windows-1252 is a very simple encoding, it has only 251 characters defined. A trivial lookup table to convert each 8bit character to its UTF-8 equivalent would suffice.
Upvotes: 0