Reputation: 23
In Windows, the value of the Unicode character ö
(Latin small letter o with diaeresis) in the CP437 character set is 148
.
In Linux, the byte value for ö
in the UTF-8 encoding is:
-61(Hi Byte)
-74(Lo Byte)
(unsigned value = 46787)
My Question is, how can I convert from 148
from CP437 to UTF-8 in C++ on Linux?
The detailed info for my problem lies here:
open() function in Linux with extended characters (128-255) returns -1 error
Temporary solution:
C++11 supports the conversion to UTF-8 using codecvt_utf8
Upvotes: 0
Views: 9622
Reputation: 71
I did a working code from @remy-lebeau response. I hope it helps.
std::string cp2UTF8(int codePage, const char* in, int inlen) {
// first convert input from codePage to wide char
int widelen = MultiByteToWideChar(codePage, 0, in, inlen, 0, 0);
std::wstring wide(widelen, L'\0');
MultiByteToWideChar(codePage, 0, in, inlen, &wide[0], widelen);
// then convert wide char to utf8
int utf8len = WideCharToMultiByte(CP_UTF8, 0, wide.data(), widelen, NULL, 0, NULL, NULL);
std::string utf8(utf8len, '\0');
WideCharToMultiByte(CP_UTF8, 0, wide.data(), widelen, &utf8[0], utf8len, NULL, NULL);
return utf8;
}
Upvotes: 0
Reputation: 2518
It is not in C++, but you can also use bash to convert a file:
$ iconv -f CP437 -t UTF-8 input_file_name.txt -o output_file_name.txt
Upvotes: 2
Reputation: 23
I found this solution to Convert CP437 to UTF8. This works perfectly in LINUX
BYTE high, low;
WORD result;
if (sCMResult.wChar > 0x80 && sCMResult.wChar <= 0x7ff)
{
low = (0xc0 | ((sCMResult.wChar >> 6) & 0x1f));
high = (0x80 | (sCMResult.wChar & 0x3f));
result = low | (high << 8);
}
Full post can be found here
Upvotes: -1
Reputation: 595827
On Windows, you can use the Win32 MultiByteToWideChar()
function to convert data from CP437 to UTF-16, and then use the WideCharToMultiByte()
function to convert data from UTF-16 to UTF-8.
On Linux, you can use a Unicode conversion library, like libiconv or ICU (which are available for Windows, too).
In C++11 and later, you can use std::wstring_convert
to:
convert from CP437 to either UTF-16 or UTF-32/UCS-4 (if you can get/make a codecvt
for CP437, that is).
then, convert from UTF-16 or UTF-32/UCS-4 to UTF-8.
You can't use codecvt_utf8
to convert from CP437 to UTF-8 directly. It only supports conversions between:
UTF-8 and UCS-2 (not UTF-16!)
UTF-8 and UTF-32/UCS-4.
You have to use codecvt_utf8_utf16
for conversions between UTF-8 and UTF-16.
Or, you can use mbrtoc16()
to convert CP437 to UTF-16 using a CP437 locale, and then use c16rtomb()
to convert UTF-16 to UTF-8 using a UTF-8 locale (if your STL library implements a fix for DR488, otherwise c16rtomb()
only supports UCS-2 and not UTF-16!).
Otherwise, just create your own CP437-to-UTF8 lookup table for the 256 possible CP437 bytes, and then do the conversion manually, one byte at a time.
Upvotes: 6