Zeus
Zeus

Reputation: 571

ISO-8859 to UTF-8 Conversion C++

I have been trying to convert the ISO-8859 charset to utf-8 with the code obtained from : Convert ISO-8859-1 strings to UTF-8 in C/C++ Here is my code :

#include <iostream>
#include <string>

using namespace std;
int main(int argc,char* argv[])
{
    string fileName ="ħëlö";
    int len= fileName.length();
    char* in = new char[len+1];
    char* out = new char[2*(len+1)];
    memset(in,'\0',len+1);
    memset(out,'\0',len+1);
    memcpy(in,fileName.c_str(),2*(len+1));


    while( *in )
    {
            cout << " ::: " << in ;
            if( *in <128 )
            {
                    *out++ = *in++;
            }
            else
            {
                    *out++ = 0xc2+(*in>0xbf);
                    *out++ = (*in++&0x3f)+0x80;
            }
    }
    cout << "\n\n out ::: " << out << "\n";
    *out = '\0';
}

But the output is

::: ħëlö ::: ?ëlö ::: ëlö ::: ?lö ::: lö ::: ö ::: ?

 out :::   

The output 'out' should be a utf-8 string and it is not. I'm getting this in Mac OS X..

What am i doing wrong here ..?

Upvotes: 0

Views: 1360

Answers (2)

Esailija
Esailija

Reputation: 140220

ISO-8859-1 does not have the character ħ so your source cannot possibly be in ISO-8859-1 as the method requires. Or your source is in ISO-8859-1, but ħ will be replaced with ? once you save it.

Upvotes: 1

unwind
unwind

Reputation: 399823

You are incrementing the out pointer in the loop, causing you to lose track of where the output starts. The pointer being passed to cout is the incremented one, so it obviously doesn't point at the start of the generated output any longer.

Further, the termination of out happens after printing it, which of course is the wrong way around.

Also, this relies on the encoding of the source code and stuff, not very nice. You should express the input string differently, using individual characters with hex values or something to be on the safe side.

Upvotes: 2

Related Questions