peiman F.
peiman F.

Reputation: 1658

base64 encoding for utf-8 strings

i have rad studio xe5 i used indy EncodeString for encoding the input string...

my code is like this:

procedure TForm5.Button2Click(Sender: TObject);
var
  UTF8: UTF8String;
begin
UTF8 := UTF8Encode(m1.Text);
m2.Text := ind.EncodeString(UTF8);
end;

but the output is wrong for utf-8 inputs

orange  --> b3Jhbmdl  [correct]
book   --> Ym9vaw==   [correct]
سلام  -->  Pz8/Pw==   [wrong]
کتاب  --> Pz8/Pw==   [wrong]
دلفی  --> Pz8/Pw==   [wrong]

for utf-8 for all inputs it returned same out put!!! what is wrong with my code and how can i have a good result of base64 encoding with utf-8 strings

Upvotes: 7

Views: 13662

Answers (3)

Astghik
Astghik

Reputation: 21

For RadStudio10 C++

#include <IdGlobal.hpp> String my_str = L"Շնորհակալություն"; String str = IdEncoderMIME1->EncodeString(my_str ,IndyTextEncoding_UTF8()); my_str = IdDecoderMIME1->DecodeString(str ,IndyTextEncoding_UTF8());

Upvotes: 2

Remy Lebeau
Remy Lebeau

Reputation: 596713

Like @RRUZ said, EncodeString() expects you to specify a byte encoding that the input String will be converted to, and then those octets will be encoded to base64.

You are passing a UTF8String to EncodeString(), which takes a UnicodeString as input in XE5, so the RTL will convert the UTF8String data back to UTF-16, undoing your UTF8Encode() (which is deprecated, BTW). Since you are not specifying a byte encoding, Indy uses its default encoding, which is set to ASCII by default (configurable via the GIdDefaultTextEncoding variable in the IdGlobal unit).

That is why orange works (no data loss) but سلام fails (data loss).

You need to get rid of your UTF8String altogether, and let Indy handle the UTF-8 for you:

procedure TForm5.Button2Click(Sender: TObject);
begin
  m2.Text := TIdEncoderMIME.EncodeString(m1.Text, IndyTextEncoding_UTF8);
end;

DecodeString() has a similar parameter for specifying the byte encoding of the octets that have been base64 encoded. The input is first decoded to bytes, and then the bytes are converted to UnicodeString using the specified byte encoding, eg:

procedure TForm5.Button3Click(Sender: TObject);
begin
  m1.Text := TIdDecoderMIME.DecodeString(m2.Text, IndyTextEncoding_UTF8);
end;

Upvotes: 11

RRUZ
RRUZ

Reputation: 136421

You must call the EncodeString method passing a proper byte encoding class.

Try this

m2.Text := TIdEncoderMIME.EncodeString(UTF8, IndyUTF8Encoding);

(IndyUTF8Encoding is defined in the IdGlobalunit)

Upvotes: 5

Related Questions