Reputation: 571
I have been trying to convert the ISO-8859 charset to utf-8 with the code obtained from : Convert ISO-8859-1 strings to UTF-8 in C/C++ Here is my code :
#include <iostream>
#include <string>
using namespace std;
int main(int argc,char* argv[])
{
string fileName ="ħëlö";
int len= fileName.length();
char* in = new char[len+1];
char* out = new char[2*(len+1)];
memset(in,'\0',len+1);
memset(out,'\0',len+1);
memcpy(in,fileName.c_str(),2*(len+1));
while( *in )
{
cout << " ::: " << in ;
if( *in <128 )
{
*out++ = *in++;
}
else
{
*out++ = 0xc2+(*in>0xbf);
*out++ = (*in++&0x3f)+0x80;
}
}
cout << "\n\n out ::: " << out << "\n";
*out = '\0';
}
But the output is
::: ħëlö ::: ?ëlö ::: ëlö ::: ?lö ::: lö ::: ö ::: ?
out :::
The output 'out' should be a utf-8 string and it is not. I'm getting this in Mac OS X..
What am i doing wrong here ..?
Upvotes: 0
Views: 1360
Reputation: 140220
ISO-8859-1 does not have the character ħ
so your source cannot possibly be in ISO-8859-1 as the method requires. Or your source is in ISO-8859-1, but ħ
will be replaced with ?
once you save it.
Upvotes: 1
Reputation: 399823
You are incrementing the out
pointer in the loop, causing you to lose track of where the output starts. The pointer being passed to cout
is the incremented one, so it obviously doesn't point at the start of the generated output any longer.
Further, the termination of out
happens after printing it, which of course is the wrong way around.
Also, this relies on the encoding of the source code and stuff, not very nice. You should express the input string differently, using individual characters with hex values or something to be on the safe side.
Upvotes: 2