Marius
Marius

Reputation: 2542

Indy message with Unicode Subject

I need to create a IdMessage with Unicode subject (eg "本語 - test")

I have tried setting it using

Msg.Subject := UTF8Encode(subject);

where subject is a WideString containing the text above but when I look at the encoded subject (by saving the Message to file) it looks like this:

Subject: =?UTF-8?Q?=C3=A6=C5=93=C2=AC=C3=A8=C2=AA=C5=BE?= - test

instead of

Subject: =?UTF-8?Q?=E6=0C=AC=E8=AA=9E?= - test

and Outlook displays it as "本語 - test"

Any pointers as to where I am going wrong?

Delphi 2006 (pre-unicode), Indy 10 (fairly recent from source)

Upvotes: 1

Views: 3812

Answers (1)

Remy Lebeau
Remy Lebeau

Reputation: 595827

In pre-Unicode versions of Delphi, where everything is based on AnsiString, the value you assign to the TIdMessage.Subject property (and any other AnsiString property of TIdMessage, for that matter) MUST be encoded using the OS default character encoding. You are encoding it to UTF-8 instead, which will not work. This is because TIdMessage will first decode the Subject value to Unicode using the OS default encoding, then MIME-encode the Unicode data using the encoding parameters provided by the TIdMessage.OnInitializeISO event, or defaults if no event handler is assigned (in this case, those parameters are CharSet=UTF-8 and HeaderEncoding=QuotedPrintable). TIdMessage has no mechanism to allow you to specify the encoding used for any AnsiString data you assign to it. So the only possibility to send a value of '本語 - test' with the Subject property is to assign your source WideString as-is to the property and let the RTL convert the data to AnsiString using the OS default encoding:

Msg.Subject := subject;

However, if the OS does not support the Unicode characters being used, there will be data lost. There is no avoiding that in this scenario.

The alternative is to set the Subject property to a blank string and then use the TIdMessage.ExtraHeaders property instead so that you can provide your own header value that will be put into the email as-is. Using this approach, you can call Indy's EncodeHeader() function directly. In pre-Unicode versions of Delphi, it has an optional ASrcEncoding parameter that defaults to the OS default encoding (TIdMessage does not currently provide a value for that parameter when encoding headers):

uses
  ..., IdCoderHeader;

Msg.Subject := '';
Msg.ExtraHeaders.Values['Subject'] := EncodeHeader(UTF8Encode(subject), '', 'Q', 'UTF-8', IndyTextEncoding_UTF8);

This way, EncodeHeader() will be able to avoid a redundant conversion because it can detect that the source and target character encodings are both UTF-8, and thus just MIME-encode the source UTF-8 data as-is. Worse case, even if it did not detect the character encodings were the same, it would simply decode the source data to Unicode using UTF-8 and then re-encode it back to UTF-8. Those are lossless conversions, so no data is lost.

And FYI, the correct encoding for the Unicode characters you have shown would be:

Subject: =?UTF-8?Q?=E6=9C=AC=E8=AA=9E?= - test

Not

Subject: =?UTF-8?Q?=E6=0C=AC=E8=AA=9E?= - test

As you have shown. Notice the second encoded octet is 9C instead of 0C.

Upvotes: 6

Related Questions